0.000 000 000 000 000 000 008 505 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 505(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 505(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 505.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 505 × 2 = 0 + 0.000 000 000 000 000 000 017 01;
  • 2) 0.000 000 000 000 000 000 017 01 × 2 = 0 + 0.000 000 000 000 000 000 034 02;
  • 3) 0.000 000 000 000 000 000 034 02 × 2 = 0 + 0.000 000 000 000 000 000 068 04;
  • 4) 0.000 000 000 000 000 000 068 04 × 2 = 0 + 0.000 000 000 000 000 000 136 08;
  • 5) 0.000 000 000 000 000 000 136 08 × 2 = 0 + 0.000 000 000 000 000 000 272 16;
  • 6) 0.000 000 000 000 000 000 272 16 × 2 = 0 + 0.000 000 000 000 000 000 544 32;
  • 7) 0.000 000 000 000 000 000 544 32 × 2 = 0 + 0.000 000 000 000 000 001 088 64;
  • 8) 0.000 000 000 000 000 001 088 64 × 2 = 0 + 0.000 000 000 000 000 002 177 28;
  • 9) 0.000 000 000 000 000 002 177 28 × 2 = 0 + 0.000 000 000 000 000 004 354 56;
  • 10) 0.000 000 000 000 000 004 354 56 × 2 = 0 + 0.000 000 000 000 000 008 709 12;
  • 11) 0.000 000 000 000 000 008 709 12 × 2 = 0 + 0.000 000 000 000 000 017 418 24;
  • 12) 0.000 000 000 000 000 017 418 24 × 2 = 0 + 0.000 000 000 000 000 034 836 48;
  • 13) 0.000 000 000 000 000 034 836 48 × 2 = 0 + 0.000 000 000 000 000 069 672 96;
  • 14) 0.000 000 000 000 000 069 672 96 × 2 = 0 + 0.000 000 000 000 000 139 345 92;
  • 15) 0.000 000 000 000 000 139 345 92 × 2 = 0 + 0.000 000 000 000 000 278 691 84;
  • 16) 0.000 000 000 000 000 278 691 84 × 2 = 0 + 0.000 000 000 000 000 557 383 68;
  • 17) 0.000 000 000 000 000 557 383 68 × 2 = 0 + 0.000 000 000 000 001 114 767 36;
  • 18) 0.000 000 000 000 001 114 767 36 × 2 = 0 + 0.000 000 000 000 002 229 534 72;
  • 19) 0.000 000 000 000 002 229 534 72 × 2 = 0 + 0.000 000 000 000 004 459 069 44;
  • 20) 0.000 000 000 000 004 459 069 44 × 2 = 0 + 0.000 000 000 000 008 918 138 88;
  • 21) 0.000 000 000 000 008 918 138 88 × 2 = 0 + 0.000 000 000 000 017 836 277 76;
  • 22) 0.000 000 000 000 017 836 277 76 × 2 = 0 + 0.000 000 000 000 035 672 555 52;
  • 23) 0.000 000 000 000 035 672 555 52 × 2 = 0 + 0.000 000 000 000 071 345 111 04;
  • 24) 0.000 000 000 000 071 345 111 04 × 2 = 0 + 0.000 000 000 000 142 690 222 08;
  • 25) 0.000 000 000 000 142 690 222 08 × 2 = 0 + 0.000 000 000 000 285 380 444 16;
  • 26) 0.000 000 000 000 285 380 444 16 × 2 = 0 + 0.000 000 000 000 570 760 888 32;
  • 27) 0.000 000 000 000 570 760 888 32 × 2 = 0 + 0.000 000 000 001 141 521 776 64;
  • 28) 0.000 000 000 001 141 521 776 64 × 2 = 0 + 0.000 000 000 002 283 043 553 28;
  • 29) 0.000 000 000 002 283 043 553 28 × 2 = 0 + 0.000 000 000 004 566 087 106 56;
  • 30) 0.000 000 000 004 566 087 106 56 × 2 = 0 + 0.000 000 000 009 132 174 213 12;
  • 31) 0.000 000 000 009 132 174 213 12 × 2 = 0 + 0.000 000 000 018 264 348 426 24;
  • 32) 0.000 000 000 018 264 348 426 24 × 2 = 0 + 0.000 000 000 036 528 696 852 48;
  • 33) 0.000 000 000 036 528 696 852 48 × 2 = 0 + 0.000 000 000 073 057 393 704 96;
  • 34) 0.000 000 000 073 057 393 704 96 × 2 = 0 + 0.000 000 000 146 114 787 409 92;
  • 35) 0.000 000 000 146 114 787 409 92 × 2 = 0 + 0.000 000 000 292 229 574 819 84;
  • 36) 0.000 000 000 292 229 574 819 84 × 2 = 0 + 0.000 000 000 584 459 149 639 68;
  • 37) 0.000 000 000 584 459 149 639 68 × 2 = 0 + 0.000 000 001 168 918 299 279 36;
  • 38) 0.000 000 001 168 918 299 279 36 × 2 = 0 + 0.000 000 002 337 836 598 558 72;
  • 39) 0.000 000 002 337 836 598 558 72 × 2 = 0 + 0.000 000 004 675 673 197 117 44;
  • 40) 0.000 000 004 675 673 197 117 44 × 2 = 0 + 0.000 000 009 351 346 394 234 88;
  • 41) 0.000 000 009 351 346 394 234 88 × 2 = 0 + 0.000 000 018 702 692 788 469 76;
  • 42) 0.000 000 018 702 692 788 469 76 × 2 = 0 + 0.000 000 037 405 385 576 939 52;
  • 43) 0.000 000 037 405 385 576 939 52 × 2 = 0 + 0.000 000 074 810 771 153 879 04;
  • 44) 0.000 000 074 810 771 153 879 04 × 2 = 0 + 0.000 000 149 621 542 307 758 08;
  • 45) 0.000 000 149 621 542 307 758 08 × 2 = 0 + 0.000 000 299 243 084 615 516 16;
  • 46) 0.000 000 299 243 084 615 516 16 × 2 = 0 + 0.000 000 598 486 169 231 032 32;
  • 47) 0.000 000 598 486 169 231 032 32 × 2 = 0 + 0.000 001 196 972 338 462 064 64;
  • 48) 0.000 001 196 972 338 462 064 64 × 2 = 0 + 0.000 002 393 944 676 924 129 28;
  • 49) 0.000 002 393 944 676 924 129 28 × 2 = 0 + 0.000 004 787 889 353 848 258 56;
  • 50) 0.000 004 787 889 353 848 258 56 × 2 = 0 + 0.000 009 575 778 707 696 517 12;
  • 51) 0.000 009 575 778 707 696 517 12 × 2 = 0 + 0.000 019 151 557 415 393 034 24;
  • 52) 0.000 019 151 557 415 393 034 24 × 2 = 0 + 0.000 038 303 114 830 786 068 48;
  • 53) 0.000 038 303 114 830 786 068 48 × 2 = 0 + 0.000 076 606 229 661 572 136 96;
  • 54) 0.000 076 606 229 661 572 136 96 × 2 = 0 + 0.000 153 212 459 323 144 273 92;
  • 55) 0.000 153 212 459 323 144 273 92 × 2 = 0 + 0.000 306 424 918 646 288 547 84;
  • 56) 0.000 306 424 918 646 288 547 84 × 2 = 0 + 0.000 612 849 837 292 577 095 68;
  • 57) 0.000 612 849 837 292 577 095 68 × 2 = 0 + 0.001 225 699 674 585 154 191 36;
  • 58) 0.001 225 699 674 585 154 191 36 × 2 = 0 + 0.002 451 399 349 170 308 382 72;
  • 59) 0.002 451 399 349 170 308 382 72 × 2 = 0 + 0.004 902 798 698 340 616 765 44;
  • 60) 0.004 902 798 698 340 616 765 44 × 2 = 0 + 0.009 805 597 396 681 233 530 88;
  • 61) 0.009 805 597 396 681 233 530 88 × 2 = 0 + 0.019 611 194 793 362 467 061 76;
  • 62) 0.019 611 194 793 362 467 061 76 × 2 = 0 + 0.039 222 389 586 724 934 123 52;
  • 63) 0.039 222 389 586 724 934 123 52 × 2 = 0 + 0.078 444 779 173 449 868 247 04;
  • 64) 0.078 444 779 173 449 868 247 04 × 2 = 0 + 0.156 889 558 346 899 736 494 08;
  • 65) 0.156 889 558 346 899 736 494 08 × 2 = 0 + 0.313 779 116 693 799 472 988 16;
  • 66) 0.313 779 116 693 799 472 988 16 × 2 = 0 + 0.627 558 233 387 598 945 976 32;
  • 67) 0.627 558 233 387 598 945 976 32 × 2 = 1 + 0.255 116 466 775 197 891 952 64;
  • 68) 0.255 116 466 775 197 891 952 64 × 2 = 0 + 0.510 232 933 550 395 783 905 28;
  • 69) 0.510 232 933 550 395 783 905 28 × 2 = 1 + 0.020 465 867 100 791 567 810 56;
  • 70) 0.020 465 867 100 791 567 810 56 × 2 = 0 + 0.040 931 734 201 583 135 621 12;
  • 71) 0.040 931 734 201 583 135 621 12 × 2 = 0 + 0.081 863 468 403 166 271 242 24;
  • 72) 0.081 863 468 403 166 271 242 24 × 2 = 0 + 0.163 726 936 806 332 542 484 48;
  • 73) 0.163 726 936 806 332 542 484 48 × 2 = 0 + 0.327 453 873 612 665 084 968 96;
  • 74) 0.327 453 873 612 665 084 968 96 × 2 = 0 + 0.654 907 747 225 330 169 937 92;
  • 75) 0.654 907 747 225 330 169 937 92 × 2 = 1 + 0.309 815 494 450 660 339 875 84;
  • 76) 0.309 815 494 450 660 339 875 84 × 2 = 0 + 0.619 630 988 901 320 679 751 68;
  • 77) 0.619 630 988 901 320 679 751 68 × 2 = 1 + 0.239 261 977 802 641 359 503 36;
  • 78) 0.239 261 977 802 641 359 503 36 × 2 = 0 + 0.478 523 955 605 282 719 006 72;
  • 79) 0.478 523 955 605 282 719 006 72 × 2 = 0 + 0.957 047 911 210 565 438 013 44;
  • 80) 0.957 047 911 210 565 438 013 44 × 2 = 1 + 0.914 095 822 421 130 876 026 88;
  • 81) 0.914 095 822 421 130 876 026 88 × 2 = 1 + 0.828 191 644 842 261 752 053 76;
  • 82) 0.828 191 644 842 261 752 053 76 × 2 = 1 + 0.656 383 289 684 523 504 107 52;
  • 83) 0.656 383 289 684 523 504 107 52 × 2 = 1 + 0.312 766 579 369 047 008 215 04;
  • 84) 0.312 766 579 369 047 008 215 04 × 2 = 0 + 0.625 533 158 738 094 016 430 08;
  • 85) 0.625 533 158 738 094 016 430 08 × 2 = 1 + 0.251 066 317 476 188 032 860 16;
  • 86) 0.251 066 317 476 188 032 860 16 × 2 = 0 + 0.502 132 634 952 376 065 720 32;
  • 87) 0.502 132 634 952 376 065 720 32 × 2 = 1 + 0.004 265 269 904 752 131 440 64;
  • 88) 0.004 265 269 904 752 131 440 64 × 2 = 0 + 0.008 530 539 809 504 262 881 28;
  • 89) 0.008 530 539 809 504 262 881 28 × 2 = 0 + 0.017 061 079 619 008 525 762 56;
  • 90) 0.017 061 079 619 008 525 762 56 × 2 = 0 + 0.034 122 159 238 017 051 525 12;
  • 91) 0.034 122 159 238 017 051 525 12 × 2 = 0 + 0.068 244 318 476 034 103 050 24;
  • 92) 0.068 244 318 476 034 103 050 24 × 2 = 0 + 0.136 488 636 952 068 206 100 48;
  • 93) 0.136 488 636 952 068 206 100 48 × 2 = 0 + 0.272 977 273 904 136 412 200 96;
  • 94) 0.272 977 273 904 136 412 200 96 × 2 = 0 + 0.545 954 547 808 272 824 401 92;
  • 95) 0.545 954 547 808 272 824 401 92 × 2 = 1 + 0.091 909 095 616 545 648 803 84;
  • 96) 0.091 909 095 616 545 648 803 84 × 2 = 0 + 0.183 818 191 233 091 297 607 68;
  • 97) 0.183 818 191 233 091 297 607 68 × 2 = 0 + 0.367 636 382 466 182 595 215 36;
  • 98) 0.367 636 382 466 182 595 215 36 × 2 = 0 + 0.735 272 764 932 365 190 430 72;
  • 99) 0.735 272 764 932 365 190 430 72 × 2 = 1 + 0.470 545 529 864 730 380 861 44;
  • 100) 0.470 545 529 864 730 380 861 44 × 2 = 0 + 0.941 091 059 729 460 761 722 88;
  • 101) 0.941 091 059 729 460 761 722 88 × 2 = 1 + 0.882 182 119 458 921 523 445 76;
  • 102) 0.882 182 119 458 921 523 445 76 × 2 = 1 + 0.764 364 238 917 843 046 891 52;
  • 103) 0.764 364 238 917 843 046 891 52 × 2 = 1 + 0.528 728 477 835 686 093 783 04;
  • 104) 0.528 728 477 835 686 093 783 04 × 2 = 1 + 0.057 456 955 671 372 187 566 08;
  • 105) 0.057 456 955 671 372 187 566 08 × 2 = 0 + 0.114 913 911 342 744 375 132 16;
  • 106) 0.114 913 911 342 744 375 132 16 × 2 = 0 + 0.229 827 822 685 488 750 264 32;
  • 107) 0.229 827 822 685 488 750 264 32 × 2 = 0 + 0.459 655 645 370 977 500 528 64;
  • 108) 0.459 655 645 370 977 500 528 64 × 2 = 0 + 0.919 311 290 741 955 001 057 28;
  • 109) 0.919 311 290 741 955 001 057 28 × 2 = 1 + 0.838 622 581 483 910 002 114 56;
  • 110) 0.838 622 581 483 910 002 114 56 × 2 = 1 + 0.677 245 162 967 820 004 229 12;
  • 111) 0.677 245 162 967 820 004 229 12 × 2 = 1 + 0.354 490 325 935 640 008 458 24;
  • 112) 0.354 490 325 935 640 008 458 24 × 2 = 0 + 0.708 980 651 871 280 016 916 48;
  • 113) 0.708 980 651 871 280 016 916 48 × 2 = 1 + 0.417 961 303 742 560 033 832 96;
  • 114) 0.417 961 303 742 560 033 832 96 × 2 = 0 + 0.835 922 607 485 120 067 665 92;
  • 115) 0.835 922 607 485 120 067 665 92 × 2 = 1 + 0.671 845 214 970 240 135 331 84;
  • 116) 0.671 845 214 970 240 135 331 84 × 2 = 1 + 0.343 690 429 940 480 270 663 68;
  • 117) 0.343 690 429 940 480 270 663 68 × 2 = 0 + 0.687 380 859 880 960 541 327 36;
  • 118) 0.687 380 859 880 960 541 327 36 × 2 = 1 + 0.374 761 719 761 921 082 654 72;
  • 119) 0.374 761 719 761 921 082 654 72 × 2 = 0 + 0.749 523 439 523 842 165 309 44;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 505(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0010 1001 1110 1010 0000 0010 0010 1111 0000 1110 1011 010(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 505(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0010 1001 1110 1010 0000 0010 0010 1111 0000 1110 1011 010(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 505(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0010 1001 1110 1010 0000 0010 0010 1111 0000 1110 1011 010(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0010 1001 1110 1010 0000 0010 0010 1111 0000 1110 1011 010(2) × 20 =


1.0100 0001 0100 1111 0101 0000 0001 0001 0111 1000 0111 0101 1010(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0001 0100 1111 0101 0000 0001 0001 0111 1000 0111 0101 1010


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0001 0100 1111 0101 0000 0001 0001 0111 1000 0111 0101 1010 =


0100 0001 0100 1111 0101 0000 0001 0001 0111 1000 0111 0101 1010


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0001 0100 1111 0101 0000 0001 0001 0111 1000 0111 0101 1010


Decimal number 0.000 000 000 000 000 000 008 505 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0001 0100 1111 0101 0000 0001 0001 0111 1000 0111 0101 1010


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100