0.000 000 000 000 000 000 008 555 6 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 555 6(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 555 6(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 555 6.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 555 6 × 2 = 0 + 0.000 000 000 000 000 000 017 111 2;
  • 2) 0.000 000 000 000 000 000 017 111 2 × 2 = 0 + 0.000 000 000 000 000 000 034 222 4;
  • 3) 0.000 000 000 000 000 000 034 222 4 × 2 = 0 + 0.000 000 000 000 000 000 068 444 8;
  • 4) 0.000 000 000 000 000 000 068 444 8 × 2 = 0 + 0.000 000 000 000 000 000 136 889 6;
  • 5) 0.000 000 000 000 000 000 136 889 6 × 2 = 0 + 0.000 000 000 000 000 000 273 779 2;
  • 6) 0.000 000 000 000 000 000 273 779 2 × 2 = 0 + 0.000 000 000 000 000 000 547 558 4;
  • 7) 0.000 000 000 000 000 000 547 558 4 × 2 = 0 + 0.000 000 000 000 000 001 095 116 8;
  • 8) 0.000 000 000 000 000 001 095 116 8 × 2 = 0 + 0.000 000 000 000 000 002 190 233 6;
  • 9) 0.000 000 000 000 000 002 190 233 6 × 2 = 0 + 0.000 000 000 000 000 004 380 467 2;
  • 10) 0.000 000 000 000 000 004 380 467 2 × 2 = 0 + 0.000 000 000 000 000 008 760 934 4;
  • 11) 0.000 000 000 000 000 008 760 934 4 × 2 = 0 + 0.000 000 000 000 000 017 521 868 8;
  • 12) 0.000 000 000 000 000 017 521 868 8 × 2 = 0 + 0.000 000 000 000 000 035 043 737 6;
  • 13) 0.000 000 000 000 000 035 043 737 6 × 2 = 0 + 0.000 000 000 000 000 070 087 475 2;
  • 14) 0.000 000 000 000 000 070 087 475 2 × 2 = 0 + 0.000 000 000 000 000 140 174 950 4;
  • 15) 0.000 000 000 000 000 140 174 950 4 × 2 = 0 + 0.000 000 000 000 000 280 349 900 8;
  • 16) 0.000 000 000 000 000 280 349 900 8 × 2 = 0 + 0.000 000 000 000 000 560 699 801 6;
  • 17) 0.000 000 000 000 000 560 699 801 6 × 2 = 0 + 0.000 000 000 000 001 121 399 603 2;
  • 18) 0.000 000 000 000 001 121 399 603 2 × 2 = 0 + 0.000 000 000 000 002 242 799 206 4;
  • 19) 0.000 000 000 000 002 242 799 206 4 × 2 = 0 + 0.000 000 000 000 004 485 598 412 8;
  • 20) 0.000 000 000 000 004 485 598 412 8 × 2 = 0 + 0.000 000 000 000 008 971 196 825 6;
  • 21) 0.000 000 000 000 008 971 196 825 6 × 2 = 0 + 0.000 000 000 000 017 942 393 651 2;
  • 22) 0.000 000 000 000 017 942 393 651 2 × 2 = 0 + 0.000 000 000 000 035 884 787 302 4;
  • 23) 0.000 000 000 000 035 884 787 302 4 × 2 = 0 + 0.000 000 000 000 071 769 574 604 8;
  • 24) 0.000 000 000 000 071 769 574 604 8 × 2 = 0 + 0.000 000 000 000 143 539 149 209 6;
  • 25) 0.000 000 000 000 143 539 149 209 6 × 2 = 0 + 0.000 000 000 000 287 078 298 419 2;
  • 26) 0.000 000 000 000 287 078 298 419 2 × 2 = 0 + 0.000 000 000 000 574 156 596 838 4;
  • 27) 0.000 000 000 000 574 156 596 838 4 × 2 = 0 + 0.000 000 000 001 148 313 193 676 8;
  • 28) 0.000 000 000 001 148 313 193 676 8 × 2 = 0 + 0.000 000 000 002 296 626 387 353 6;
  • 29) 0.000 000 000 002 296 626 387 353 6 × 2 = 0 + 0.000 000 000 004 593 252 774 707 2;
  • 30) 0.000 000 000 004 593 252 774 707 2 × 2 = 0 + 0.000 000 000 009 186 505 549 414 4;
  • 31) 0.000 000 000 009 186 505 549 414 4 × 2 = 0 + 0.000 000 000 018 373 011 098 828 8;
  • 32) 0.000 000 000 018 373 011 098 828 8 × 2 = 0 + 0.000 000 000 036 746 022 197 657 6;
  • 33) 0.000 000 000 036 746 022 197 657 6 × 2 = 0 + 0.000 000 000 073 492 044 395 315 2;
  • 34) 0.000 000 000 073 492 044 395 315 2 × 2 = 0 + 0.000 000 000 146 984 088 790 630 4;
  • 35) 0.000 000 000 146 984 088 790 630 4 × 2 = 0 + 0.000 000 000 293 968 177 581 260 8;
  • 36) 0.000 000 000 293 968 177 581 260 8 × 2 = 0 + 0.000 000 000 587 936 355 162 521 6;
  • 37) 0.000 000 000 587 936 355 162 521 6 × 2 = 0 + 0.000 000 001 175 872 710 325 043 2;
  • 38) 0.000 000 001 175 872 710 325 043 2 × 2 = 0 + 0.000 000 002 351 745 420 650 086 4;
  • 39) 0.000 000 002 351 745 420 650 086 4 × 2 = 0 + 0.000 000 004 703 490 841 300 172 8;
  • 40) 0.000 000 004 703 490 841 300 172 8 × 2 = 0 + 0.000 000 009 406 981 682 600 345 6;
  • 41) 0.000 000 009 406 981 682 600 345 6 × 2 = 0 + 0.000 000 018 813 963 365 200 691 2;
  • 42) 0.000 000 018 813 963 365 200 691 2 × 2 = 0 + 0.000 000 037 627 926 730 401 382 4;
  • 43) 0.000 000 037 627 926 730 401 382 4 × 2 = 0 + 0.000 000 075 255 853 460 802 764 8;
  • 44) 0.000 000 075 255 853 460 802 764 8 × 2 = 0 + 0.000 000 150 511 706 921 605 529 6;
  • 45) 0.000 000 150 511 706 921 605 529 6 × 2 = 0 + 0.000 000 301 023 413 843 211 059 2;
  • 46) 0.000 000 301 023 413 843 211 059 2 × 2 = 0 + 0.000 000 602 046 827 686 422 118 4;
  • 47) 0.000 000 602 046 827 686 422 118 4 × 2 = 0 + 0.000 001 204 093 655 372 844 236 8;
  • 48) 0.000 001 204 093 655 372 844 236 8 × 2 = 0 + 0.000 002 408 187 310 745 688 473 6;
  • 49) 0.000 002 408 187 310 745 688 473 6 × 2 = 0 + 0.000 004 816 374 621 491 376 947 2;
  • 50) 0.000 004 816 374 621 491 376 947 2 × 2 = 0 + 0.000 009 632 749 242 982 753 894 4;
  • 51) 0.000 009 632 749 242 982 753 894 4 × 2 = 0 + 0.000 019 265 498 485 965 507 788 8;
  • 52) 0.000 019 265 498 485 965 507 788 8 × 2 = 0 + 0.000 038 530 996 971 931 015 577 6;
  • 53) 0.000 038 530 996 971 931 015 577 6 × 2 = 0 + 0.000 077 061 993 943 862 031 155 2;
  • 54) 0.000 077 061 993 943 862 031 155 2 × 2 = 0 + 0.000 154 123 987 887 724 062 310 4;
  • 55) 0.000 154 123 987 887 724 062 310 4 × 2 = 0 + 0.000 308 247 975 775 448 124 620 8;
  • 56) 0.000 308 247 975 775 448 124 620 8 × 2 = 0 + 0.000 616 495 951 550 896 249 241 6;
  • 57) 0.000 616 495 951 550 896 249 241 6 × 2 = 0 + 0.001 232 991 903 101 792 498 483 2;
  • 58) 0.001 232 991 903 101 792 498 483 2 × 2 = 0 + 0.002 465 983 806 203 584 996 966 4;
  • 59) 0.002 465 983 806 203 584 996 966 4 × 2 = 0 + 0.004 931 967 612 407 169 993 932 8;
  • 60) 0.004 931 967 612 407 169 993 932 8 × 2 = 0 + 0.009 863 935 224 814 339 987 865 6;
  • 61) 0.009 863 935 224 814 339 987 865 6 × 2 = 0 + 0.019 727 870 449 628 679 975 731 2;
  • 62) 0.019 727 870 449 628 679 975 731 2 × 2 = 0 + 0.039 455 740 899 257 359 951 462 4;
  • 63) 0.039 455 740 899 257 359 951 462 4 × 2 = 0 + 0.078 911 481 798 514 719 902 924 8;
  • 64) 0.078 911 481 798 514 719 902 924 8 × 2 = 0 + 0.157 822 963 597 029 439 805 849 6;
  • 65) 0.157 822 963 597 029 439 805 849 6 × 2 = 0 + 0.315 645 927 194 058 879 611 699 2;
  • 66) 0.315 645 927 194 058 879 611 699 2 × 2 = 0 + 0.631 291 854 388 117 759 223 398 4;
  • 67) 0.631 291 854 388 117 759 223 398 4 × 2 = 1 + 0.262 583 708 776 235 518 446 796 8;
  • 68) 0.262 583 708 776 235 518 446 796 8 × 2 = 0 + 0.525 167 417 552 471 036 893 593 6;
  • 69) 0.525 167 417 552 471 036 893 593 6 × 2 = 1 + 0.050 334 835 104 942 073 787 187 2;
  • 70) 0.050 334 835 104 942 073 787 187 2 × 2 = 0 + 0.100 669 670 209 884 147 574 374 4;
  • 71) 0.100 669 670 209 884 147 574 374 4 × 2 = 0 + 0.201 339 340 419 768 295 148 748 8;
  • 72) 0.201 339 340 419 768 295 148 748 8 × 2 = 0 + 0.402 678 680 839 536 590 297 497 6;
  • 73) 0.402 678 680 839 536 590 297 497 6 × 2 = 0 + 0.805 357 361 679 073 180 594 995 2;
  • 74) 0.805 357 361 679 073 180 594 995 2 × 2 = 1 + 0.610 714 723 358 146 361 189 990 4;
  • 75) 0.610 714 723 358 146 361 189 990 4 × 2 = 1 + 0.221 429 446 716 292 722 379 980 8;
  • 76) 0.221 429 446 716 292 722 379 980 8 × 2 = 0 + 0.442 858 893 432 585 444 759 961 6;
  • 77) 0.442 858 893 432 585 444 759 961 6 × 2 = 0 + 0.885 717 786 865 170 889 519 923 2;
  • 78) 0.885 717 786 865 170 889 519 923 2 × 2 = 1 + 0.771 435 573 730 341 779 039 846 4;
  • 79) 0.771 435 573 730 341 779 039 846 4 × 2 = 1 + 0.542 871 147 460 683 558 079 692 8;
  • 80) 0.542 871 147 460 683 558 079 692 8 × 2 = 1 + 0.085 742 294 921 367 116 159 385 6;
  • 81) 0.085 742 294 921 367 116 159 385 6 × 2 = 0 + 0.171 484 589 842 734 232 318 771 2;
  • 82) 0.171 484 589 842 734 232 318 771 2 × 2 = 0 + 0.342 969 179 685 468 464 637 542 4;
  • 83) 0.342 969 179 685 468 464 637 542 4 × 2 = 0 + 0.685 938 359 370 936 929 275 084 8;
  • 84) 0.685 938 359 370 936 929 275 084 8 × 2 = 1 + 0.371 876 718 741 873 858 550 169 6;
  • 85) 0.371 876 718 741 873 858 550 169 6 × 2 = 0 + 0.743 753 437 483 747 717 100 339 2;
  • 86) 0.743 753 437 483 747 717 100 339 2 × 2 = 1 + 0.487 506 874 967 495 434 200 678 4;
  • 87) 0.487 506 874 967 495 434 200 678 4 × 2 = 0 + 0.975 013 749 934 990 868 401 356 8;
  • 88) 0.975 013 749 934 990 868 401 356 8 × 2 = 1 + 0.950 027 499 869 981 736 802 713 6;
  • 89) 0.950 027 499 869 981 736 802 713 6 × 2 = 1 + 0.900 054 999 739 963 473 605 427 2;
  • 90) 0.900 054 999 739 963 473 605 427 2 × 2 = 1 + 0.800 109 999 479 926 947 210 854 4;
  • 91) 0.800 109 999 479 926 947 210 854 4 × 2 = 1 + 0.600 219 998 959 853 894 421 708 8;
  • 92) 0.600 219 998 959 853 894 421 708 8 × 2 = 1 + 0.200 439 997 919 707 788 843 417 6;
  • 93) 0.200 439 997 919 707 788 843 417 6 × 2 = 0 + 0.400 879 995 839 415 577 686 835 2;
  • 94) 0.400 879 995 839 415 577 686 835 2 × 2 = 0 + 0.801 759 991 678 831 155 373 670 4;
  • 95) 0.801 759 991 678 831 155 373 670 4 × 2 = 1 + 0.603 519 983 357 662 310 747 340 8;
  • 96) 0.603 519 983 357 662 310 747 340 8 × 2 = 1 + 0.207 039 966 715 324 621 494 681 6;
  • 97) 0.207 039 966 715 324 621 494 681 6 × 2 = 0 + 0.414 079 933 430 649 242 989 363 2;
  • 98) 0.414 079 933 430 649 242 989 363 2 × 2 = 0 + 0.828 159 866 861 298 485 978 726 4;
  • 99) 0.828 159 866 861 298 485 978 726 4 × 2 = 1 + 0.656 319 733 722 596 971 957 452 8;
  • 100) 0.656 319 733 722 596 971 957 452 8 × 2 = 1 + 0.312 639 467 445 193 943 914 905 6;
  • 101) 0.312 639 467 445 193 943 914 905 6 × 2 = 0 + 0.625 278 934 890 387 887 829 811 2;
  • 102) 0.625 278 934 890 387 887 829 811 2 × 2 = 1 + 0.250 557 869 780 775 775 659 622 4;
  • 103) 0.250 557 869 780 775 775 659 622 4 × 2 = 0 + 0.501 115 739 561 551 551 319 244 8;
  • 104) 0.501 115 739 561 551 551 319 244 8 × 2 = 1 + 0.002 231 479 123 103 102 638 489 6;
  • 105) 0.002 231 479 123 103 102 638 489 6 × 2 = 0 + 0.004 462 958 246 206 205 276 979 2;
  • 106) 0.004 462 958 246 206 205 276 979 2 × 2 = 0 + 0.008 925 916 492 412 410 553 958 4;
  • 107) 0.008 925 916 492 412 410 553 958 4 × 2 = 0 + 0.017 851 832 984 824 821 107 916 8;
  • 108) 0.017 851 832 984 824 821 107 916 8 × 2 = 0 + 0.035 703 665 969 649 642 215 833 6;
  • 109) 0.035 703 665 969 649 642 215 833 6 × 2 = 0 + 0.071 407 331 939 299 284 431 667 2;
  • 110) 0.071 407 331 939 299 284 431 667 2 × 2 = 0 + 0.142 814 663 878 598 568 863 334 4;
  • 111) 0.142 814 663 878 598 568 863 334 4 × 2 = 0 + 0.285 629 327 757 197 137 726 668 8;
  • 112) 0.285 629 327 757 197 137 726 668 8 × 2 = 0 + 0.571 258 655 514 394 275 453 337 6;
  • 113) 0.571 258 655 514 394 275 453 337 6 × 2 = 1 + 0.142 517 311 028 788 550 906 675 2;
  • 114) 0.142 517 311 028 788 550 906 675 2 × 2 = 0 + 0.285 034 622 057 577 101 813 350 4;
  • 115) 0.285 034 622 057 577 101 813 350 4 × 2 = 0 + 0.570 069 244 115 154 203 626 700 8;
  • 116) 0.570 069 244 115 154 203 626 700 8 × 2 = 1 + 0.140 138 488 230 308 407 253 401 6;
  • 117) 0.140 138 488 230 308 407 253 401 6 × 2 = 0 + 0.280 276 976 460 616 814 506 803 2;
  • 118) 0.280 276 976 460 616 814 506 803 2 × 2 = 0 + 0.560 553 952 921 233 629 013 606 4;
  • 119) 0.560 553 952 921 233 629 013 606 4 × 2 = 1 + 0.121 107 905 842 467 258 027 212 8;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 555 6(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0110 0111 0001 0101 1111 0011 0011 0101 0000 0000 1001 001(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 555 6(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0110 0111 0001 0101 1111 0011 0011 0101 0000 0000 1001 001(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 555 6(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0110 0111 0001 0101 1111 0011 0011 0101 0000 0000 1001 001(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0110 0111 0001 0101 1111 0011 0011 0101 0000 0000 1001 001(2) × 20 =


1.0100 0011 0011 1000 1010 1111 1001 1001 1010 1000 0000 0100 1001(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0011 0011 1000 1010 1111 1001 1001 1010 1000 0000 0100 1001


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0011 0011 1000 1010 1111 1001 1001 1010 1000 0000 0100 1001 =


0100 0011 0011 1000 1010 1111 1001 1001 1010 1000 0000 0100 1001


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0011 0011 1000 1010 1111 1001 1001 1010 1000 0000 0100 1001


Decimal number 0.000 000 000 000 000 000 008 555 6 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0011 0011 1000 1010 1111 1001 1001 1010 1000 0000 0100 1001


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100