0.000 000 000 000 725 886 648 1 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 725 886 648 1(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 725 886 648 1(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 725 886 648 1.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 725 886 648 1 × 2 = 0 + 0.000 000 000 001 451 773 296 2;
  • 2) 0.000 000 000 001 451 773 296 2 × 2 = 0 + 0.000 000 000 002 903 546 592 4;
  • 3) 0.000 000 000 002 903 546 592 4 × 2 = 0 + 0.000 000 000 005 807 093 184 8;
  • 4) 0.000 000 000 005 807 093 184 8 × 2 = 0 + 0.000 000 000 011 614 186 369 6;
  • 5) 0.000 000 000 011 614 186 369 6 × 2 = 0 + 0.000 000 000 023 228 372 739 2;
  • 6) 0.000 000 000 023 228 372 739 2 × 2 = 0 + 0.000 000 000 046 456 745 478 4;
  • 7) 0.000 000 000 046 456 745 478 4 × 2 = 0 + 0.000 000 000 092 913 490 956 8;
  • 8) 0.000 000 000 092 913 490 956 8 × 2 = 0 + 0.000 000 000 185 826 981 913 6;
  • 9) 0.000 000 000 185 826 981 913 6 × 2 = 0 + 0.000 000 000 371 653 963 827 2;
  • 10) 0.000 000 000 371 653 963 827 2 × 2 = 0 + 0.000 000 000 743 307 927 654 4;
  • 11) 0.000 000 000 743 307 927 654 4 × 2 = 0 + 0.000 000 001 486 615 855 308 8;
  • 12) 0.000 000 001 486 615 855 308 8 × 2 = 0 + 0.000 000 002 973 231 710 617 6;
  • 13) 0.000 000 002 973 231 710 617 6 × 2 = 0 + 0.000 000 005 946 463 421 235 2;
  • 14) 0.000 000 005 946 463 421 235 2 × 2 = 0 + 0.000 000 011 892 926 842 470 4;
  • 15) 0.000 000 011 892 926 842 470 4 × 2 = 0 + 0.000 000 023 785 853 684 940 8;
  • 16) 0.000 000 023 785 853 684 940 8 × 2 = 0 + 0.000 000 047 571 707 369 881 6;
  • 17) 0.000 000 047 571 707 369 881 6 × 2 = 0 + 0.000 000 095 143 414 739 763 2;
  • 18) 0.000 000 095 143 414 739 763 2 × 2 = 0 + 0.000 000 190 286 829 479 526 4;
  • 19) 0.000 000 190 286 829 479 526 4 × 2 = 0 + 0.000 000 380 573 658 959 052 8;
  • 20) 0.000 000 380 573 658 959 052 8 × 2 = 0 + 0.000 000 761 147 317 918 105 6;
  • 21) 0.000 000 761 147 317 918 105 6 × 2 = 0 + 0.000 001 522 294 635 836 211 2;
  • 22) 0.000 001 522 294 635 836 211 2 × 2 = 0 + 0.000 003 044 589 271 672 422 4;
  • 23) 0.000 003 044 589 271 672 422 4 × 2 = 0 + 0.000 006 089 178 543 344 844 8;
  • 24) 0.000 006 089 178 543 344 844 8 × 2 = 0 + 0.000 012 178 357 086 689 689 6;
  • 25) 0.000 012 178 357 086 689 689 6 × 2 = 0 + 0.000 024 356 714 173 379 379 2;
  • 26) 0.000 024 356 714 173 379 379 2 × 2 = 0 + 0.000 048 713 428 346 758 758 4;
  • 27) 0.000 048 713 428 346 758 758 4 × 2 = 0 + 0.000 097 426 856 693 517 516 8;
  • 28) 0.000 097 426 856 693 517 516 8 × 2 = 0 + 0.000 194 853 713 387 035 033 6;
  • 29) 0.000 194 853 713 387 035 033 6 × 2 = 0 + 0.000 389 707 426 774 070 067 2;
  • 30) 0.000 389 707 426 774 070 067 2 × 2 = 0 + 0.000 779 414 853 548 140 134 4;
  • 31) 0.000 779 414 853 548 140 134 4 × 2 = 0 + 0.001 558 829 707 096 280 268 8;
  • 32) 0.001 558 829 707 096 280 268 8 × 2 = 0 + 0.003 117 659 414 192 560 537 6;
  • 33) 0.003 117 659 414 192 560 537 6 × 2 = 0 + 0.006 235 318 828 385 121 075 2;
  • 34) 0.006 235 318 828 385 121 075 2 × 2 = 0 + 0.012 470 637 656 770 242 150 4;
  • 35) 0.012 470 637 656 770 242 150 4 × 2 = 0 + 0.024 941 275 313 540 484 300 8;
  • 36) 0.024 941 275 313 540 484 300 8 × 2 = 0 + 0.049 882 550 627 080 968 601 6;
  • 37) 0.049 882 550 627 080 968 601 6 × 2 = 0 + 0.099 765 101 254 161 937 203 2;
  • 38) 0.099 765 101 254 161 937 203 2 × 2 = 0 + 0.199 530 202 508 323 874 406 4;
  • 39) 0.199 530 202 508 323 874 406 4 × 2 = 0 + 0.399 060 405 016 647 748 812 8;
  • 40) 0.399 060 405 016 647 748 812 8 × 2 = 0 + 0.798 120 810 033 295 497 625 6;
  • 41) 0.798 120 810 033 295 497 625 6 × 2 = 1 + 0.596 241 620 066 590 995 251 2;
  • 42) 0.596 241 620 066 590 995 251 2 × 2 = 1 + 0.192 483 240 133 181 990 502 4;
  • 43) 0.192 483 240 133 181 990 502 4 × 2 = 0 + 0.384 966 480 266 363 981 004 8;
  • 44) 0.384 966 480 266 363 981 004 8 × 2 = 0 + 0.769 932 960 532 727 962 009 6;
  • 45) 0.769 932 960 532 727 962 009 6 × 2 = 1 + 0.539 865 921 065 455 924 019 2;
  • 46) 0.539 865 921 065 455 924 019 2 × 2 = 1 + 0.079 731 842 130 911 848 038 4;
  • 47) 0.079 731 842 130 911 848 038 4 × 2 = 0 + 0.159 463 684 261 823 696 076 8;
  • 48) 0.159 463 684 261 823 696 076 8 × 2 = 0 + 0.318 927 368 523 647 392 153 6;
  • 49) 0.318 927 368 523 647 392 153 6 × 2 = 0 + 0.637 854 737 047 294 784 307 2;
  • 50) 0.637 854 737 047 294 784 307 2 × 2 = 1 + 0.275 709 474 094 589 568 614 4;
  • 51) 0.275 709 474 094 589 568 614 4 × 2 = 0 + 0.551 418 948 189 179 137 228 8;
  • 52) 0.551 418 948 189 179 137 228 8 × 2 = 1 + 0.102 837 896 378 358 274 457 6;
  • 53) 0.102 837 896 378 358 274 457 6 × 2 = 0 + 0.205 675 792 756 716 548 915 2;
  • 54) 0.205 675 792 756 716 548 915 2 × 2 = 0 + 0.411 351 585 513 433 097 830 4;
  • 55) 0.411 351 585 513 433 097 830 4 × 2 = 0 + 0.822 703 171 026 866 195 660 8;
  • 56) 0.822 703 171 026 866 195 660 8 × 2 = 1 + 0.645 406 342 053 732 391 321 6;
  • 57) 0.645 406 342 053 732 391 321 6 × 2 = 1 + 0.290 812 684 107 464 782 643 2;
  • 58) 0.290 812 684 107 464 782 643 2 × 2 = 0 + 0.581 625 368 214 929 565 286 4;
  • 59) 0.581 625 368 214 929 565 286 4 × 2 = 1 + 0.163 250 736 429 859 130 572 8;
  • 60) 0.163 250 736 429 859 130 572 8 × 2 = 0 + 0.326 501 472 859 718 261 145 6;
  • 61) 0.326 501 472 859 718 261 145 6 × 2 = 0 + 0.653 002 945 719 436 522 291 2;
  • 62) 0.653 002 945 719 436 522 291 2 × 2 = 1 + 0.306 005 891 438 873 044 582 4;
  • 63) 0.306 005 891 438 873 044 582 4 × 2 = 0 + 0.612 011 782 877 746 089 164 8;
  • 64) 0.612 011 782 877 746 089 164 8 × 2 = 1 + 0.224 023 565 755 492 178 329 6;
  • 65) 0.224 023 565 755 492 178 329 6 × 2 = 0 + 0.448 047 131 510 984 356 659 2;
  • 66) 0.448 047 131 510 984 356 659 2 × 2 = 0 + 0.896 094 263 021 968 713 318 4;
  • 67) 0.896 094 263 021 968 713 318 4 × 2 = 1 + 0.792 188 526 043 937 426 636 8;
  • 68) 0.792 188 526 043 937 426 636 8 × 2 = 1 + 0.584 377 052 087 874 853 273 6;
  • 69) 0.584 377 052 087 874 853 273 6 × 2 = 1 + 0.168 754 104 175 749 706 547 2;
  • 70) 0.168 754 104 175 749 706 547 2 × 2 = 0 + 0.337 508 208 351 499 413 094 4;
  • 71) 0.337 508 208 351 499 413 094 4 × 2 = 0 + 0.675 016 416 702 998 826 188 8;
  • 72) 0.675 016 416 702 998 826 188 8 × 2 = 1 + 0.350 032 833 405 997 652 377 6;
  • 73) 0.350 032 833 405 997 652 377 6 × 2 = 0 + 0.700 065 666 811 995 304 755 2;
  • 74) 0.700 065 666 811 995 304 755 2 × 2 = 1 + 0.400 131 333 623 990 609 510 4;
  • 75) 0.400 131 333 623 990 609 510 4 × 2 = 0 + 0.800 262 667 247 981 219 020 8;
  • 76) 0.800 262 667 247 981 219 020 8 × 2 = 1 + 0.600 525 334 495 962 438 041 6;
  • 77) 0.600 525 334 495 962 438 041 6 × 2 = 1 + 0.201 050 668 991 924 876 083 2;
  • 78) 0.201 050 668 991 924 876 083 2 × 2 = 0 + 0.402 101 337 983 849 752 166 4;
  • 79) 0.402 101 337 983 849 752 166 4 × 2 = 0 + 0.804 202 675 967 699 504 332 8;
  • 80) 0.804 202 675 967 699 504 332 8 × 2 = 1 + 0.608 405 351 935 399 008 665 6;
  • 81) 0.608 405 351 935 399 008 665 6 × 2 = 1 + 0.216 810 703 870 798 017 331 2;
  • 82) 0.216 810 703 870 798 017 331 2 × 2 = 0 + 0.433 621 407 741 596 034 662 4;
  • 83) 0.433 621 407 741 596 034 662 4 × 2 = 0 + 0.867 242 815 483 192 069 324 8;
  • 84) 0.867 242 815 483 192 069 324 8 × 2 = 1 + 0.734 485 630 966 384 138 649 6;
  • 85) 0.734 485 630 966 384 138 649 6 × 2 = 1 + 0.468 971 261 932 768 277 299 2;
  • 86) 0.468 971 261 932 768 277 299 2 × 2 = 0 + 0.937 942 523 865 536 554 598 4;
  • 87) 0.937 942 523 865 536 554 598 4 × 2 = 1 + 0.875 885 047 731 073 109 196 8;
  • 88) 0.875 885 047 731 073 109 196 8 × 2 = 1 + 0.751 770 095 462 146 218 393 6;
  • 89) 0.751 770 095 462 146 218 393 6 × 2 = 1 + 0.503 540 190 924 292 436 787 2;
  • 90) 0.503 540 190 924 292 436 787 2 × 2 = 1 + 0.007 080 381 848 584 873 574 4;
  • 91) 0.007 080 381 848 584 873 574 4 × 2 = 0 + 0.014 160 763 697 169 747 148 8;
  • 92) 0.014 160 763 697 169 747 148 8 × 2 = 0 + 0.028 321 527 394 339 494 297 6;
  • 93) 0.028 321 527 394 339 494 297 6 × 2 = 0 + 0.056 643 054 788 678 988 595 2;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 725 886 648 1(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1100 1100 0101 0001 1010 0101 0011 1001 0101 1001 1001 1011 1100 0(2)

5. Positive number before normalization:

0.000 000 000 000 725 886 648 1(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1100 1100 0101 0001 1010 0101 0011 1001 0101 1001 1001 1011 1100 0(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 41 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 725 886 648 1(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1100 1100 0101 0001 1010 0101 0011 1001 0101 1001 1001 1011 1100 0(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1100 1100 0101 0001 1010 0101 0011 1001 0101 1001 1001 1011 1100 0(2) × 20 =


1.1001 1000 1010 0011 0100 1010 0111 0010 1011 0011 0011 0111 1000(2) × 2-41


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -41


Mantissa (not normalized):
1.1001 1000 1010 0011 0100 1010 0111 0010 1011 0011 0011 0111 1000


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-41 + 2(11-1) - 1 =


(-41 + 1 023)(10) =


982(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 982 ÷ 2 = 491 + 0;
  • 491 ÷ 2 = 245 + 1;
  • 245 ÷ 2 = 122 + 1;
  • 122 ÷ 2 = 61 + 0;
  • 61 ÷ 2 = 30 + 1;
  • 30 ÷ 2 = 15 + 0;
  • 15 ÷ 2 = 7 + 1;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


982(10) =


011 1101 0110(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 1001 1000 1010 0011 0100 1010 0111 0010 1011 0011 0011 0111 1000 =


1001 1000 1010 0011 0100 1010 0111 0010 1011 0011 0011 0111 1000


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1101 0110


Mantissa (52 bits) =
1001 1000 1010 0011 0100 1010 0111 0010 1011 0011 0011 0111 1000


Decimal number 0.000 000 000 000 725 886 648 1 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1101 0110 - 1001 1000 1010 0011 0100 1010 0111 0010 1011 0011 0011 0111 1000


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100