0.000 000 000 000 000 000 207 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 207(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 207(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 207.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 207 × 2 = 0 + 0.000 000 000 000 000 000 414;
  • 2) 0.000 000 000 000 000 000 414 × 2 = 0 + 0.000 000 000 000 000 000 828;
  • 3) 0.000 000 000 000 000 000 828 × 2 = 0 + 0.000 000 000 000 000 001 656;
  • 4) 0.000 000 000 000 000 001 656 × 2 = 0 + 0.000 000 000 000 000 003 312;
  • 5) 0.000 000 000 000 000 003 312 × 2 = 0 + 0.000 000 000 000 000 006 624;
  • 6) 0.000 000 000 000 000 006 624 × 2 = 0 + 0.000 000 000 000 000 013 248;
  • 7) 0.000 000 000 000 000 013 248 × 2 = 0 + 0.000 000 000 000 000 026 496;
  • 8) 0.000 000 000 000 000 026 496 × 2 = 0 + 0.000 000 000 000 000 052 992;
  • 9) 0.000 000 000 000 000 052 992 × 2 = 0 + 0.000 000 000 000 000 105 984;
  • 10) 0.000 000 000 000 000 105 984 × 2 = 0 + 0.000 000 000 000 000 211 968;
  • 11) 0.000 000 000 000 000 211 968 × 2 = 0 + 0.000 000 000 000 000 423 936;
  • 12) 0.000 000 000 000 000 423 936 × 2 = 0 + 0.000 000 000 000 000 847 872;
  • 13) 0.000 000 000 000 000 847 872 × 2 = 0 + 0.000 000 000 000 001 695 744;
  • 14) 0.000 000 000 000 001 695 744 × 2 = 0 + 0.000 000 000 000 003 391 488;
  • 15) 0.000 000 000 000 003 391 488 × 2 = 0 + 0.000 000 000 000 006 782 976;
  • 16) 0.000 000 000 000 006 782 976 × 2 = 0 + 0.000 000 000 000 013 565 952;
  • 17) 0.000 000 000 000 013 565 952 × 2 = 0 + 0.000 000 000 000 027 131 904;
  • 18) 0.000 000 000 000 027 131 904 × 2 = 0 + 0.000 000 000 000 054 263 808;
  • 19) 0.000 000 000 000 054 263 808 × 2 = 0 + 0.000 000 000 000 108 527 616;
  • 20) 0.000 000 000 000 108 527 616 × 2 = 0 + 0.000 000 000 000 217 055 232;
  • 21) 0.000 000 000 000 217 055 232 × 2 = 0 + 0.000 000 000 000 434 110 464;
  • 22) 0.000 000 000 000 434 110 464 × 2 = 0 + 0.000 000 000 000 868 220 928;
  • 23) 0.000 000 000 000 868 220 928 × 2 = 0 + 0.000 000 000 001 736 441 856;
  • 24) 0.000 000 000 001 736 441 856 × 2 = 0 + 0.000 000 000 003 472 883 712;
  • 25) 0.000 000 000 003 472 883 712 × 2 = 0 + 0.000 000 000 006 945 767 424;
  • 26) 0.000 000 000 006 945 767 424 × 2 = 0 + 0.000 000 000 013 891 534 848;
  • 27) 0.000 000 000 013 891 534 848 × 2 = 0 + 0.000 000 000 027 783 069 696;
  • 28) 0.000 000 000 027 783 069 696 × 2 = 0 + 0.000 000 000 055 566 139 392;
  • 29) 0.000 000 000 055 566 139 392 × 2 = 0 + 0.000 000 000 111 132 278 784;
  • 30) 0.000 000 000 111 132 278 784 × 2 = 0 + 0.000 000 000 222 264 557 568;
  • 31) 0.000 000 000 222 264 557 568 × 2 = 0 + 0.000 000 000 444 529 115 136;
  • 32) 0.000 000 000 444 529 115 136 × 2 = 0 + 0.000 000 000 889 058 230 272;
  • 33) 0.000 000 000 889 058 230 272 × 2 = 0 + 0.000 000 001 778 116 460 544;
  • 34) 0.000 000 001 778 116 460 544 × 2 = 0 + 0.000 000 003 556 232 921 088;
  • 35) 0.000 000 003 556 232 921 088 × 2 = 0 + 0.000 000 007 112 465 842 176;
  • 36) 0.000 000 007 112 465 842 176 × 2 = 0 + 0.000 000 014 224 931 684 352;
  • 37) 0.000 000 014 224 931 684 352 × 2 = 0 + 0.000 000 028 449 863 368 704;
  • 38) 0.000 000 028 449 863 368 704 × 2 = 0 + 0.000 000 056 899 726 737 408;
  • 39) 0.000 000 056 899 726 737 408 × 2 = 0 + 0.000 000 113 799 453 474 816;
  • 40) 0.000 000 113 799 453 474 816 × 2 = 0 + 0.000 000 227 598 906 949 632;
  • 41) 0.000 000 227 598 906 949 632 × 2 = 0 + 0.000 000 455 197 813 899 264;
  • 42) 0.000 000 455 197 813 899 264 × 2 = 0 + 0.000 000 910 395 627 798 528;
  • 43) 0.000 000 910 395 627 798 528 × 2 = 0 + 0.000 001 820 791 255 597 056;
  • 44) 0.000 001 820 791 255 597 056 × 2 = 0 + 0.000 003 641 582 511 194 112;
  • 45) 0.000 003 641 582 511 194 112 × 2 = 0 + 0.000 007 283 165 022 388 224;
  • 46) 0.000 007 283 165 022 388 224 × 2 = 0 + 0.000 014 566 330 044 776 448;
  • 47) 0.000 014 566 330 044 776 448 × 2 = 0 + 0.000 029 132 660 089 552 896;
  • 48) 0.000 029 132 660 089 552 896 × 2 = 0 + 0.000 058 265 320 179 105 792;
  • 49) 0.000 058 265 320 179 105 792 × 2 = 0 + 0.000 116 530 640 358 211 584;
  • 50) 0.000 116 530 640 358 211 584 × 2 = 0 + 0.000 233 061 280 716 423 168;
  • 51) 0.000 233 061 280 716 423 168 × 2 = 0 + 0.000 466 122 561 432 846 336;
  • 52) 0.000 466 122 561 432 846 336 × 2 = 0 + 0.000 932 245 122 865 692 672;
  • 53) 0.000 932 245 122 865 692 672 × 2 = 0 + 0.001 864 490 245 731 385 344;
  • 54) 0.001 864 490 245 731 385 344 × 2 = 0 + 0.003 728 980 491 462 770 688;
  • 55) 0.003 728 980 491 462 770 688 × 2 = 0 + 0.007 457 960 982 925 541 376;
  • 56) 0.007 457 960 982 925 541 376 × 2 = 0 + 0.014 915 921 965 851 082 752;
  • 57) 0.014 915 921 965 851 082 752 × 2 = 0 + 0.029 831 843 931 702 165 504;
  • 58) 0.029 831 843 931 702 165 504 × 2 = 0 + 0.059 663 687 863 404 331 008;
  • 59) 0.059 663 687 863 404 331 008 × 2 = 0 + 0.119 327 375 726 808 662 016;
  • 60) 0.119 327 375 726 808 662 016 × 2 = 0 + 0.238 654 751 453 617 324 032;
  • 61) 0.238 654 751 453 617 324 032 × 2 = 0 + 0.477 309 502 907 234 648 064;
  • 62) 0.477 309 502 907 234 648 064 × 2 = 0 + 0.954 619 005 814 469 296 128;
  • 63) 0.954 619 005 814 469 296 128 × 2 = 1 + 0.909 238 011 628 938 592 256;
  • 64) 0.909 238 011 628 938 592 256 × 2 = 1 + 0.818 476 023 257 877 184 512;
  • 65) 0.818 476 023 257 877 184 512 × 2 = 1 + 0.636 952 046 515 754 369 024;
  • 66) 0.636 952 046 515 754 369 024 × 2 = 1 + 0.273 904 093 031 508 738 048;
  • 67) 0.273 904 093 031 508 738 048 × 2 = 0 + 0.547 808 186 063 017 476 096;
  • 68) 0.547 808 186 063 017 476 096 × 2 = 1 + 0.095 616 372 126 034 952 192;
  • 69) 0.095 616 372 126 034 952 192 × 2 = 0 + 0.191 232 744 252 069 904 384;
  • 70) 0.191 232 744 252 069 904 384 × 2 = 0 + 0.382 465 488 504 139 808 768;
  • 71) 0.382 465 488 504 139 808 768 × 2 = 0 + 0.764 930 977 008 279 617 536;
  • 72) 0.764 930 977 008 279 617 536 × 2 = 1 + 0.529 861 954 016 559 235 072;
  • 73) 0.529 861 954 016 559 235 072 × 2 = 1 + 0.059 723 908 033 118 470 144;
  • 74) 0.059 723 908 033 118 470 144 × 2 = 0 + 0.119 447 816 066 236 940 288;
  • 75) 0.119 447 816 066 236 940 288 × 2 = 0 + 0.238 895 632 132 473 880 576;
  • 76) 0.238 895 632 132 473 880 576 × 2 = 0 + 0.477 791 264 264 947 761 152;
  • 77) 0.477 791 264 264 947 761 152 × 2 = 0 + 0.955 582 528 529 895 522 304;
  • 78) 0.955 582 528 529 895 522 304 × 2 = 1 + 0.911 165 057 059 791 044 608;
  • 79) 0.911 165 057 059 791 044 608 × 2 = 1 + 0.822 330 114 119 582 089 216;
  • 80) 0.822 330 114 119 582 089 216 × 2 = 1 + 0.644 660 228 239 164 178 432;
  • 81) 0.644 660 228 239 164 178 432 × 2 = 1 + 0.289 320 456 478 328 356 864;
  • 82) 0.289 320 456 478 328 356 864 × 2 = 0 + 0.578 640 912 956 656 713 728;
  • 83) 0.578 640 912 956 656 713 728 × 2 = 1 + 0.157 281 825 913 313 427 456;
  • 84) 0.157 281 825 913 313 427 456 × 2 = 0 + 0.314 563 651 826 626 854 912;
  • 85) 0.314 563 651 826 626 854 912 × 2 = 0 + 0.629 127 303 653 253 709 824;
  • 86) 0.629 127 303 653 253 709 824 × 2 = 1 + 0.258 254 607 306 507 419 648;
  • 87) 0.258 254 607 306 507 419 648 × 2 = 0 + 0.516 509 214 613 014 839 296;
  • 88) 0.516 509 214 613 014 839 296 × 2 = 1 + 0.033 018 429 226 029 678 592;
  • 89) 0.033 018 429 226 029 678 592 × 2 = 0 + 0.066 036 858 452 059 357 184;
  • 90) 0.066 036 858 452 059 357 184 × 2 = 0 + 0.132 073 716 904 118 714 368;
  • 91) 0.132 073 716 904 118 714 368 × 2 = 0 + 0.264 147 433 808 237 428 736;
  • 92) 0.264 147 433 808 237 428 736 × 2 = 0 + 0.528 294 867 616 474 857 472;
  • 93) 0.528 294 867 616 474 857 472 × 2 = 1 + 0.056 589 735 232 949 714 944;
  • 94) 0.056 589 735 232 949 714 944 × 2 = 0 + 0.113 179 470 465 899 429 888;
  • 95) 0.113 179 470 465 899 429 888 × 2 = 0 + 0.226 358 940 931 798 859 776;
  • 96) 0.226 358 940 931 798 859 776 × 2 = 0 + 0.452 717 881 863 597 719 552;
  • 97) 0.452 717 881 863 597 719 552 × 2 = 0 + 0.905 435 763 727 195 439 104;
  • 98) 0.905 435 763 727 195 439 104 × 2 = 1 + 0.810 871 527 454 390 878 208;
  • 99) 0.810 871 527 454 390 878 208 × 2 = 1 + 0.621 743 054 908 781 756 416;
  • 100) 0.621 743 054 908 781 756 416 × 2 = 1 + 0.243 486 109 817 563 512 832;
  • 101) 0.243 486 109 817 563 512 832 × 2 = 0 + 0.486 972 219 635 127 025 664;
  • 102) 0.486 972 219 635 127 025 664 × 2 = 0 + 0.973 944 439 270 254 051 328;
  • 103) 0.973 944 439 270 254 051 328 × 2 = 1 + 0.947 888 878 540 508 102 656;
  • 104) 0.947 888 878 540 508 102 656 × 2 = 1 + 0.895 777 757 081 016 205 312;
  • 105) 0.895 777 757 081 016 205 312 × 2 = 1 + 0.791 555 514 162 032 410 624;
  • 106) 0.791 555 514 162 032 410 624 × 2 = 1 + 0.583 111 028 324 064 821 248;
  • 107) 0.583 111 028 324 064 821 248 × 2 = 1 + 0.166 222 056 648 129 642 496;
  • 108) 0.166 222 056 648 129 642 496 × 2 = 0 + 0.332 444 113 296 259 284 992;
  • 109) 0.332 444 113 296 259 284 992 × 2 = 0 + 0.664 888 226 592 518 569 984;
  • 110) 0.664 888 226 592 518 569 984 × 2 = 1 + 0.329 776 453 185 037 139 968;
  • 111) 0.329 776 453 185 037 139 968 × 2 = 0 + 0.659 552 906 370 074 279 936;
  • 112) 0.659 552 906 370 074 279 936 × 2 = 1 + 0.319 105 812 740 148 559 872;
  • 113) 0.319 105 812 740 148 559 872 × 2 = 0 + 0.638 211 625 480 297 119 744;
  • 114) 0.638 211 625 480 297 119 744 × 2 = 1 + 0.276 423 250 960 594 239 488;
  • 115) 0.276 423 250 960 594 239 488 × 2 = 0 + 0.552 846 501 921 188 478 976;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 207(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0011 1101 0001 1000 0111 1010 0101 0000 1000 0111 0011 1110 0101 010(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 207(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0011 1101 0001 1000 0111 1010 0101 0000 1000 0111 0011 1110 0101 010(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 63 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 207(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0011 1101 0001 1000 0111 1010 0101 0000 1000 0111 0011 1110 0101 010(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0011 1101 0001 1000 0111 1010 0101 0000 1000 0111 0011 1110 0101 010(2) × 20 =


1.1110 1000 1100 0011 1101 0010 1000 0100 0011 1001 1111 0010 1010(2) × 2-63


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -63


Mantissa (not normalized):
1.1110 1000 1100 0011 1101 0010 1000 0100 0011 1001 1111 0010 1010


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-63 + 2(11-1) - 1 =


(-63 + 1 023)(10) =


960(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 960 ÷ 2 = 480 + 0;
  • 480 ÷ 2 = 240 + 0;
  • 240 ÷ 2 = 120 + 0;
  • 120 ÷ 2 = 60 + 0;
  • 60 ÷ 2 = 30 + 0;
  • 30 ÷ 2 = 15 + 0;
  • 15 ÷ 2 = 7 + 1;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


960(10) =


011 1100 0000(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 1110 1000 1100 0011 1101 0010 1000 0100 0011 1001 1111 0010 1010 =


1110 1000 1100 0011 1101 0010 1000 0100 0011 1001 1111 0010 1010


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1100 0000


Mantissa (52 bits) =
1110 1000 1100 0011 1101 0010 1000 0100 0011 1001 1111 0010 1010


Decimal number 0.000 000 000 000 000 000 207 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1100 0000 - 1110 1000 1100 0011 1101 0010 1000 0100 0011 1001 1111 0010 1010


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100