0.000 000 000 000 000 000 008 552 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 552(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 552(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 552.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 552 × 2 = 0 + 0.000 000 000 000 000 000 017 104;
  • 2) 0.000 000 000 000 000 000 017 104 × 2 = 0 + 0.000 000 000 000 000 000 034 208;
  • 3) 0.000 000 000 000 000 000 034 208 × 2 = 0 + 0.000 000 000 000 000 000 068 416;
  • 4) 0.000 000 000 000 000 000 068 416 × 2 = 0 + 0.000 000 000 000 000 000 136 832;
  • 5) 0.000 000 000 000 000 000 136 832 × 2 = 0 + 0.000 000 000 000 000 000 273 664;
  • 6) 0.000 000 000 000 000 000 273 664 × 2 = 0 + 0.000 000 000 000 000 000 547 328;
  • 7) 0.000 000 000 000 000 000 547 328 × 2 = 0 + 0.000 000 000 000 000 001 094 656;
  • 8) 0.000 000 000 000 000 001 094 656 × 2 = 0 + 0.000 000 000 000 000 002 189 312;
  • 9) 0.000 000 000 000 000 002 189 312 × 2 = 0 + 0.000 000 000 000 000 004 378 624;
  • 10) 0.000 000 000 000 000 004 378 624 × 2 = 0 + 0.000 000 000 000 000 008 757 248;
  • 11) 0.000 000 000 000 000 008 757 248 × 2 = 0 + 0.000 000 000 000 000 017 514 496;
  • 12) 0.000 000 000 000 000 017 514 496 × 2 = 0 + 0.000 000 000 000 000 035 028 992;
  • 13) 0.000 000 000 000 000 035 028 992 × 2 = 0 + 0.000 000 000 000 000 070 057 984;
  • 14) 0.000 000 000 000 000 070 057 984 × 2 = 0 + 0.000 000 000 000 000 140 115 968;
  • 15) 0.000 000 000 000 000 140 115 968 × 2 = 0 + 0.000 000 000 000 000 280 231 936;
  • 16) 0.000 000 000 000 000 280 231 936 × 2 = 0 + 0.000 000 000 000 000 560 463 872;
  • 17) 0.000 000 000 000 000 560 463 872 × 2 = 0 + 0.000 000 000 000 001 120 927 744;
  • 18) 0.000 000 000 000 001 120 927 744 × 2 = 0 + 0.000 000 000 000 002 241 855 488;
  • 19) 0.000 000 000 000 002 241 855 488 × 2 = 0 + 0.000 000 000 000 004 483 710 976;
  • 20) 0.000 000 000 000 004 483 710 976 × 2 = 0 + 0.000 000 000 000 008 967 421 952;
  • 21) 0.000 000 000 000 008 967 421 952 × 2 = 0 + 0.000 000 000 000 017 934 843 904;
  • 22) 0.000 000 000 000 017 934 843 904 × 2 = 0 + 0.000 000 000 000 035 869 687 808;
  • 23) 0.000 000 000 000 035 869 687 808 × 2 = 0 + 0.000 000 000 000 071 739 375 616;
  • 24) 0.000 000 000 000 071 739 375 616 × 2 = 0 + 0.000 000 000 000 143 478 751 232;
  • 25) 0.000 000 000 000 143 478 751 232 × 2 = 0 + 0.000 000 000 000 286 957 502 464;
  • 26) 0.000 000 000 000 286 957 502 464 × 2 = 0 + 0.000 000 000 000 573 915 004 928;
  • 27) 0.000 000 000 000 573 915 004 928 × 2 = 0 + 0.000 000 000 001 147 830 009 856;
  • 28) 0.000 000 000 001 147 830 009 856 × 2 = 0 + 0.000 000 000 002 295 660 019 712;
  • 29) 0.000 000 000 002 295 660 019 712 × 2 = 0 + 0.000 000 000 004 591 320 039 424;
  • 30) 0.000 000 000 004 591 320 039 424 × 2 = 0 + 0.000 000 000 009 182 640 078 848;
  • 31) 0.000 000 000 009 182 640 078 848 × 2 = 0 + 0.000 000 000 018 365 280 157 696;
  • 32) 0.000 000 000 018 365 280 157 696 × 2 = 0 + 0.000 000 000 036 730 560 315 392;
  • 33) 0.000 000 000 036 730 560 315 392 × 2 = 0 + 0.000 000 000 073 461 120 630 784;
  • 34) 0.000 000 000 073 461 120 630 784 × 2 = 0 + 0.000 000 000 146 922 241 261 568;
  • 35) 0.000 000 000 146 922 241 261 568 × 2 = 0 + 0.000 000 000 293 844 482 523 136;
  • 36) 0.000 000 000 293 844 482 523 136 × 2 = 0 + 0.000 000 000 587 688 965 046 272;
  • 37) 0.000 000 000 587 688 965 046 272 × 2 = 0 + 0.000 000 001 175 377 930 092 544;
  • 38) 0.000 000 001 175 377 930 092 544 × 2 = 0 + 0.000 000 002 350 755 860 185 088;
  • 39) 0.000 000 002 350 755 860 185 088 × 2 = 0 + 0.000 000 004 701 511 720 370 176;
  • 40) 0.000 000 004 701 511 720 370 176 × 2 = 0 + 0.000 000 009 403 023 440 740 352;
  • 41) 0.000 000 009 403 023 440 740 352 × 2 = 0 + 0.000 000 018 806 046 881 480 704;
  • 42) 0.000 000 018 806 046 881 480 704 × 2 = 0 + 0.000 000 037 612 093 762 961 408;
  • 43) 0.000 000 037 612 093 762 961 408 × 2 = 0 + 0.000 000 075 224 187 525 922 816;
  • 44) 0.000 000 075 224 187 525 922 816 × 2 = 0 + 0.000 000 150 448 375 051 845 632;
  • 45) 0.000 000 150 448 375 051 845 632 × 2 = 0 + 0.000 000 300 896 750 103 691 264;
  • 46) 0.000 000 300 896 750 103 691 264 × 2 = 0 + 0.000 000 601 793 500 207 382 528;
  • 47) 0.000 000 601 793 500 207 382 528 × 2 = 0 + 0.000 001 203 587 000 414 765 056;
  • 48) 0.000 001 203 587 000 414 765 056 × 2 = 0 + 0.000 002 407 174 000 829 530 112;
  • 49) 0.000 002 407 174 000 829 530 112 × 2 = 0 + 0.000 004 814 348 001 659 060 224;
  • 50) 0.000 004 814 348 001 659 060 224 × 2 = 0 + 0.000 009 628 696 003 318 120 448;
  • 51) 0.000 009 628 696 003 318 120 448 × 2 = 0 + 0.000 019 257 392 006 636 240 896;
  • 52) 0.000 019 257 392 006 636 240 896 × 2 = 0 + 0.000 038 514 784 013 272 481 792;
  • 53) 0.000 038 514 784 013 272 481 792 × 2 = 0 + 0.000 077 029 568 026 544 963 584;
  • 54) 0.000 077 029 568 026 544 963 584 × 2 = 0 + 0.000 154 059 136 053 089 927 168;
  • 55) 0.000 154 059 136 053 089 927 168 × 2 = 0 + 0.000 308 118 272 106 179 854 336;
  • 56) 0.000 308 118 272 106 179 854 336 × 2 = 0 + 0.000 616 236 544 212 359 708 672;
  • 57) 0.000 616 236 544 212 359 708 672 × 2 = 0 + 0.001 232 473 088 424 719 417 344;
  • 58) 0.001 232 473 088 424 719 417 344 × 2 = 0 + 0.002 464 946 176 849 438 834 688;
  • 59) 0.002 464 946 176 849 438 834 688 × 2 = 0 + 0.004 929 892 353 698 877 669 376;
  • 60) 0.004 929 892 353 698 877 669 376 × 2 = 0 + 0.009 859 784 707 397 755 338 752;
  • 61) 0.009 859 784 707 397 755 338 752 × 2 = 0 + 0.019 719 569 414 795 510 677 504;
  • 62) 0.019 719 569 414 795 510 677 504 × 2 = 0 + 0.039 439 138 829 591 021 355 008;
  • 63) 0.039 439 138 829 591 021 355 008 × 2 = 0 + 0.078 878 277 659 182 042 710 016;
  • 64) 0.078 878 277 659 182 042 710 016 × 2 = 0 + 0.157 756 555 318 364 085 420 032;
  • 65) 0.157 756 555 318 364 085 420 032 × 2 = 0 + 0.315 513 110 636 728 170 840 064;
  • 66) 0.315 513 110 636 728 170 840 064 × 2 = 0 + 0.631 026 221 273 456 341 680 128;
  • 67) 0.631 026 221 273 456 341 680 128 × 2 = 1 + 0.262 052 442 546 912 683 360 256;
  • 68) 0.262 052 442 546 912 683 360 256 × 2 = 0 + 0.524 104 885 093 825 366 720 512;
  • 69) 0.524 104 885 093 825 366 720 512 × 2 = 1 + 0.048 209 770 187 650 733 441 024;
  • 70) 0.048 209 770 187 650 733 441 024 × 2 = 0 + 0.096 419 540 375 301 466 882 048;
  • 71) 0.096 419 540 375 301 466 882 048 × 2 = 0 + 0.192 839 080 750 602 933 764 096;
  • 72) 0.192 839 080 750 602 933 764 096 × 2 = 0 + 0.385 678 161 501 205 867 528 192;
  • 73) 0.385 678 161 501 205 867 528 192 × 2 = 0 + 0.771 356 323 002 411 735 056 384;
  • 74) 0.771 356 323 002 411 735 056 384 × 2 = 1 + 0.542 712 646 004 823 470 112 768;
  • 75) 0.542 712 646 004 823 470 112 768 × 2 = 1 + 0.085 425 292 009 646 940 225 536;
  • 76) 0.085 425 292 009 646 940 225 536 × 2 = 0 + 0.170 850 584 019 293 880 451 072;
  • 77) 0.170 850 584 019 293 880 451 072 × 2 = 0 + 0.341 701 168 038 587 760 902 144;
  • 78) 0.341 701 168 038 587 760 902 144 × 2 = 0 + 0.683 402 336 077 175 521 804 288;
  • 79) 0.683 402 336 077 175 521 804 288 × 2 = 1 + 0.366 804 672 154 351 043 608 576;
  • 80) 0.366 804 672 154 351 043 608 576 × 2 = 0 + 0.733 609 344 308 702 087 217 152;
  • 81) 0.733 609 344 308 702 087 217 152 × 2 = 1 + 0.467 218 688 617 404 174 434 304;
  • 82) 0.467 218 688 617 404 174 434 304 × 2 = 0 + 0.934 437 377 234 808 348 868 608;
  • 83) 0.934 437 377 234 808 348 868 608 × 2 = 1 + 0.868 874 754 469 616 697 737 216;
  • 84) 0.868 874 754 469 616 697 737 216 × 2 = 1 + 0.737 749 508 939 233 395 474 432;
  • 85) 0.737 749 508 939 233 395 474 432 × 2 = 1 + 0.475 499 017 878 466 790 948 864;
  • 86) 0.475 499 017 878 466 790 948 864 × 2 = 0 + 0.950 998 035 756 933 581 897 728;
  • 87) 0.950 998 035 756 933 581 897 728 × 2 = 1 + 0.901 996 071 513 867 163 795 456;
  • 88) 0.901 996 071 513 867 163 795 456 × 2 = 1 + 0.803 992 143 027 734 327 590 912;
  • 89) 0.803 992 143 027 734 327 590 912 × 2 = 1 + 0.607 984 286 055 468 655 181 824;
  • 90) 0.607 984 286 055 468 655 181 824 × 2 = 1 + 0.215 968 572 110 937 310 363 648;
  • 91) 0.215 968 572 110 937 310 363 648 × 2 = 0 + 0.431 937 144 221 874 620 727 296;
  • 92) 0.431 937 144 221 874 620 727 296 × 2 = 0 + 0.863 874 288 443 749 241 454 592;
  • 93) 0.863 874 288 443 749 241 454 592 × 2 = 1 + 0.727 748 576 887 498 482 909 184;
  • 94) 0.727 748 576 887 498 482 909 184 × 2 = 1 + 0.455 497 153 774 996 965 818 368;
  • 95) 0.455 497 153 774 996 965 818 368 × 2 = 0 + 0.910 994 307 549 993 931 636 736;
  • 96) 0.910 994 307 549 993 931 636 736 × 2 = 1 + 0.821 988 615 099 987 863 273 472;
  • 97) 0.821 988 615 099 987 863 273 472 × 2 = 1 + 0.643 977 230 199 975 726 546 944;
  • 98) 0.643 977 230 199 975 726 546 944 × 2 = 1 + 0.287 954 460 399 951 453 093 888;
  • 99) 0.287 954 460 399 951 453 093 888 × 2 = 0 + 0.575 908 920 799 902 906 187 776;
  • 100) 0.575 908 920 799 902 906 187 776 × 2 = 1 + 0.151 817 841 599 805 812 375 552;
  • 101) 0.151 817 841 599 805 812 375 552 × 2 = 0 + 0.303 635 683 199 611 624 751 104;
  • 102) 0.303 635 683 199 611 624 751 104 × 2 = 0 + 0.607 271 366 399 223 249 502 208;
  • 103) 0.607 271 366 399 223 249 502 208 × 2 = 1 + 0.214 542 732 798 446 499 004 416;
  • 104) 0.214 542 732 798 446 499 004 416 × 2 = 0 + 0.429 085 465 596 892 998 008 832;
  • 105) 0.429 085 465 596 892 998 008 832 × 2 = 0 + 0.858 170 931 193 785 996 017 664;
  • 106) 0.858 170 931 193 785 996 017 664 × 2 = 1 + 0.716 341 862 387 571 992 035 328;
  • 107) 0.716 341 862 387 571 992 035 328 × 2 = 1 + 0.432 683 724 775 143 984 070 656;
  • 108) 0.432 683 724 775 143 984 070 656 × 2 = 0 + 0.865 367 449 550 287 968 141 312;
  • 109) 0.865 367 449 550 287 968 141 312 × 2 = 1 + 0.730 734 899 100 575 936 282 624;
  • 110) 0.730 734 899 100 575 936 282 624 × 2 = 1 + 0.461 469 798 201 151 872 565 248;
  • 111) 0.461 469 798 201 151 872 565 248 × 2 = 0 + 0.922 939 596 402 303 745 130 496;
  • 112) 0.922 939 596 402 303 745 130 496 × 2 = 1 + 0.845 879 192 804 607 490 260 992;
  • 113) 0.845 879 192 804 607 490 260 992 × 2 = 1 + 0.691 758 385 609 214 980 521 984;
  • 114) 0.691 758 385 609 214 980 521 984 × 2 = 1 + 0.383 516 771 218 429 961 043 968;
  • 115) 0.383 516 771 218 429 961 043 968 × 2 = 0 + 0.767 033 542 436 859 922 087 936;
  • 116) 0.767 033 542 436 859 922 087 936 × 2 = 1 + 0.534 067 084 873 719 844 175 872;
  • 117) 0.534 067 084 873 719 844 175 872 × 2 = 1 + 0.068 134 169 747 439 688 351 744;
  • 118) 0.068 134 169 747 439 688 351 744 × 2 = 0 + 0.136 268 339 494 879 376 703 488;
  • 119) 0.136 268 339 494 879 376 703 488 × 2 = 0 + 0.272 536 678 989 758 753 406 976;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 552(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0110 0010 1011 1011 1100 1101 1101 0010 0110 1101 1101 100(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 552(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0110 0010 1011 1011 1100 1101 1101 0010 0110 1101 1101 100(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 552(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0110 0010 1011 1011 1100 1101 1101 0010 0110 1101 1101 100(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0110 0010 1011 1011 1100 1101 1101 0010 0110 1101 1101 100(2) × 20 =


1.0100 0011 0001 0101 1101 1110 0110 1110 1001 0011 0110 1110 1100(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0011 0001 0101 1101 1110 0110 1110 1001 0011 0110 1110 1100


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0011 0001 0101 1101 1110 0110 1110 1001 0011 0110 1110 1100 =


0100 0011 0001 0101 1101 1110 0110 1110 1001 0011 0110 1110 1100


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0011 0001 0101 1101 1110 0110 1110 1001 0011 0110 1110 1100


Decimal number 0.000 000 000 000 000 000 008 552 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0011 0001 0101 1101 1110 0110 1110 1001 0011 0110 1110 1100


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100