0.000 000 000 001 22 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 001 22(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 001 22(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 001 22.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 001 22 × 2 = 0 + 0.000 000 000 002 44;
  • 2) 0.000 000 000 002 44 × 2 = 0 + 0.000 000 000 004 88;
  • 3) 0.000 000 000 004 88 × 2 = 0 + 0.000 000 000 009 76;
  • 4) 0.000 000 000 009 76 × 2 = 0 + 0.000 000 000 019 52;
  • 5) 0.000 000 000 019 52 × 2 = 0 + 0.000 000 000 039 04;
  • 6) 0.000 000 000 039 04 × 2 = 0 + 0.000 000 000 078 08;
  • 7) 0.000 000 000 078 08 × 2 = 0 + 0.000 000 000 156 16;
  • 8) 0.000 000 000 156 16 × 2 = 0 + 0.000 000 000 312 32;
  • 9) 0.000 000 000 312 32 × 2 = 0 + 0.000 000 000 624 64;
  • 10) 0.000 000 000 624 64 × 2 = 0 + 0.000 000 001 249 28;
  • 11) 0.000 000 001 249 28 × 2 = 0 + 0.000 000 002 498 56;
  • 12) 0.000 000 002 498 56 × 2 = 0 + 0.000 000 004 997 12;
  • 13) 0.000 000 004 997 12 × 2 = 0 + 0.000 000 009 994 24;
  • 14) 0.000 000 009 994 24 × 2 = 0 + 0.000 000 019 988 48;
  • 15) 0.000 000 019 988 48 × 2 = 0 + 0.000 000 039 976 96;
  • 16) 0.000 000 039 976 96 × 2 = 0 + 0.000 000 079 953 92;
  • 17) 0.000 000 079 953 92 × 2 = 0 + 0.000 000 159 907 84;
  • 18) 0.000 000 159 907 84 × 2 = 0 + 0.000 000 319 815 68;
  • 19) 0.000 000 319 815 68 × 2 = 0 + 0.000 000 639 631 36;
  • 20) 0.000 000 639 631 36 × 2 = 0 + 0.000 001 279 262 72;
  • 21) 0.000 001 279 262 72 × 2 = 0 + 0.000 002 558 525 44;
  • 22) 0.000 002 558 525 44 × 2 = 0 + 0.000 005 117 050 88;
  • 23) 0.000 005 117 050 88 × 2 = 0 + 0.000 010 234 101 76;
  • 24) 0.000 010 234 101 76 × 2 = 0 + 0.000 020 468 203 52;
  • 25) 0.000 020 468 203 52 × 2 = 0 + 0.000 040 936 407 04;
  • 26) 0.000 040 936 407 04 × 2 = 0 + 0.000 081 872 814 08;
  • 27) 0.000 081 872 814 08 × 2 = 0 + 0.000 163 745 628 16;
  • 28) 0.000 163 745 628 16 × 2 = 0 + 0.000 327 491 256 32;
  • 29) 0.000 327 491 256 32 × 2 = 0 + 0.000 654 982 512 64;
  • 30) 0.000 654 982 512 64 × 2 = 0 + 0.001 309 965 025 28;
  • 31) 0.001 309 965 025 28 × 2 = 0 + 0.002 619 930 050 56;
  • 32) 0.002 619 930 050 56 × 2 = 0 + 0.005 239 860 101 12;
  • 33) 0.005 239 860 101 12 × 2 = 0 + 0.010 479 720 202 24;
  • 34) 0.010 479 720 202 24 × 2 = 0 + 0.020 959 440 404 48;
  • 35) 0.020 959 440 404 48 × 2 = 0 + 0.041 918 880 808 96;
  • 36) 0.041 918 880 808 96 × 2 = 0 + 0.083 837 761 617 92;
  • 37) 0.083 837 761 617 92 × 2 = 0 + 0.167 675 523 235 84;
  • 38) 0.167 675 523 235 84 × 2 = 0 + 0.335 351 046 471 68;
  • 39) 0.335 351 046 471 68 × 2 = 0 + 0.670 702 092 943 36;
  • 40) 0.670 702 092 943 36 × 2 = 1 + 0.341 404 185 886 72;
  • 41) 0.341 404 185 886 72 × 2 = 0 + 0.682 808 371 773 44;
  • 42) 0.682 808 371 773 44 × 2 = 1 + 0.365 616 743 546 88;
  • 43) 0.365 616 743 546 88 × 2 = 0 + 0.731 233 487 093 76;
  • 44) 0.731 233 487 093 76 × 2 = 1 + 0.462 466 974 187 52;
  • 45) 0.462 466 974 187 52 × 2 = 0 + 0.924 933 948 375 04;
  • 46) 0.924 933 948 375 04 × 2 = 1 + 0.849 867 896 750 08;
  • 47) 0.849 867 896 750 08 × 2 = 1 + 0.699 735 793 500 16;
  • 48) 0.699 735 793 500 16 × 2 = 1 + 0.399 471 587 000 32;
  • 49) 0.399 471 587 000 32 × 2 = 0 + 0.798 943 174 000 64;
  • 50) 0.798 943 174 000 64 × 2 = 1 + 0.597 886 348 001 28;
  • 51) 0.597 886 348 001 28 × 2 = 1 + 0.195 772 696 002 56;
  • 52) 0.195 772 696 002 56 × 2 = 0 + 0.391 545 392 005 12;
  • 53) 0.391 545 392 005 12 × 2 = 0 + 0.783 090 784 010 24;
  • 54) 0.783 090 784 010 24 × 2 = 1 + 0.566 181 568 020 48;
  • 55) 0.566 181 568 020 48 × 2 = 1 + 0.132 363 136 040 96;
  • 56) 0.132 363 136 040 96 × 2 = 0 + 0.264 726 272 081 92;
  • 57) 0.264 726 272 081 92 × 2 = 0 + 0.529 452 544 163 84;
  • 58) 0.529 452 544 163 84 × 2 = 1 + 0.058 905 088 327 68;
  • 59) 0.058 905 088 327 68 × 2 = 0 + 0.117 810 176 655 36;
  • 60) 0.117 810 176 655 36 × 2 = 0 + 0.235 620 353 310 72;
  • 61) 0.235 620 353 310 72 × 2 = 0 + 0.471 240 706 621 44;
  • 62) 0.471 240 706 621 44 × 2 = 0 + 0.942 481 413 242 88;
  • 63) 0.942 481 413 242 88 × 2 = 1 + 0.884 962 826 485 76;
  • 64) 0.884 962 826 485 76 × 2 = 1 + 0.769 925 652 971 52;
  • 65) 0.769 925 652 971 52 × 2 = 1 + 0.539 851 305 943 04;
  • 66) 0.539 851 305 943 04 × 2 = 1 + 0.079 702 611 886 08;
  • 67) 0.079 702 611 886 08 × 2 = 0 + 0.159 405 223 772 16;
  • 68) 0.159 405 223 772 16 × 2 = 0 + 0.318 810 447 544 32;
  • 69) 0.318 810 447 544 32 × 2 = 0 + 0.637 620 895 088 64;
  • 70) 0.637 620 895 088 64 × 2 = 1 + 0.275 241 790 177 28;
  • 71) 0.275 241 790 177 28 × 2 = 0 + 0.550 483 580 354 56;
  • 72) 0.550 483 580 354 56 × 2 = 1 + 0.100 967 160 709 12;
  • 73) 0.100 967 160 709 12 × 2 = 0 + 0.201 934 321 418 24;
  • 74) 0.201 934 321 418 24 × 2 = 0 + 0.403 868 642 836 48;
  • 75) 0.403 868 642 836 48 × 2 = 0 + 0.807 737 285 672 96;
  • 76) 0.807 737 285 672 96 × 2 = 1 + 0.615 474 571 345 92;
  • 77) 0.615 474 571 345 92 × 2 = 1 + 0.230 949 142 691 84;
  • 78) 0.230 949 142 691 84 × 2 = 0 + 0.461 898 285 383 68;
  • 79) 0.461 898 285 383 68 × 2 = 0 + 0.923 796 570 767 36;
  • 80) 0.923 796 570 767 36 × 2 = 1 + 0.847 593 141 534 72;
  • 81) 0.847 593 141 534 72 × 2 = 1 + 0.695 186 283 069 44;
  • 82) 0.695 186 283 069 44 × 2 = 1 + 0.390 372 566 138 88;
  • 83) 0.390 372 566 138 88 × 2 = 0 + 0.780 745 132 277 76;
  • 84) 0.780 745 132 277 76 × 2 = 1 + 0.561 490 264 555 52;
  • 85) 0.561 490 264 555 52 × 2 = 1 + 0.122 980 529 111 04;
  • 86) 0.122 980 529 111 04 × 2 = 0 + 0.245 961 058 222 08;
  • 87) 0.245 961 058 222 08 × 2 = 0 + 0.491 922 116 444 16;
  • 88) 0.491 922 116 444 16 × 2 = 0 + 0.983 844 232 888 32;
  • 89) 0.983 844 232 888 32 × 2 = 1 + 0.967 688 465 776 64;
  • 90) 0.967 688 465 776 64 × 2 = 1 + 0.935 376 931 553 28;
  • 91) 0.935 376 931 553 28 × 2 = 1 + 0.870 753 863 106 56;
  • 92) 0.870 753 863 106 56 × 2 = 1 + 0.741 507 726 213 12;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 001 22(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0101 0111 0110 0110 0100 0011 1100 0101 0001 1001 1101 1000 1111(2)

5. Positive number before normalization:

0.000 000 000 001 22(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0101 0111 0110 0110 0100 0011 1100 0101 0001 1001 1101 1000 1111(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 40 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 001 22(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0101 0111 0110 0110 0100 0011 1100 0101 0001 1001 1101 1000 1111(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0101 0111 0110 0110 0100 0011 1100 0101 0001 1001 1101 1000 1111(2) × 20 =


1.0101 0111 0110 0110 0100 0011 1100 0101 0001 1001 1101 1000 1111(2) × 2-40


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -40


Mantissa (not normalized):
1.0101 0111 0110 0110 0100 0011 1100 0101 0001 1001 1101 1000 1111


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-40 + 2(11-1) - 1 =


(-40 + 1 023)(10) =


983(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 983 ÷ 2 = 491 + 1;
  • 491 ÷ 2 = 245 + 1;
  • 245 ÷ 2 = 122 + 1;
  • 122 ÷ 2 = 61 + 0;
  • 61 ÷ 2 = 30 + 1;
  • 30 ÷ 2 = 15 + 0;
  • 15 ÷ 2 = 7 + 1;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


983(10) =


011 1101 0111(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0101 0111 0110 0110 0100 0011 1100 0101 0001 1001 1101 1000 1111 =


0101 0111 0110 0110 0100 0011 1100 0101 0001 1001 1101 1000 1111


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1101 0111


Mantissa (52 bits) =
0101 0111 0110 0110 0100 0011 1100 0101 0001 1001 1101 1000 1111


Decimal number 0.000 000 000 001 22 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1101 0111 - 0101 0111 0110 0110 0100 0011 1100 0101 0001 1001 1101 1000 1111


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100