64bit IEEE 754: Decimal ↗ Double Precision Floating Point Binary: 1.000 000 001 Convert the Number to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard, From a Base Ten Decimal System Number

Number 1.000 000 001(10) converted and written in 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 1.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 1 ÷ 2 = 0 + 1;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.


1(10) =


1(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 001.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 001 × 2 = 0 + 0.000 000 002;
  • 2) 0.000 000 002 × 2 = 0 + 0.000 000 004;
  • 3) 0.000 000 004 × 2 = 0 + 0.000 000 008;
  • 4) 0.000 000 008 × 2 = 0 + 0.000 000 016;
  • 5) 0.000 000 016 × 2 = 0 + 0.000 000 032;
  • 6) 0.000 000 032 × 2 = 0 + 0.000 000 064;
  • 7) 0.000 000 064 × 2 = 0 + 0.000 000 128;
  • 8) 0.000 000 128 × 2 = 0 + 0.000 000 256;
  • 9) 0.000 000 256 × 2 = 0 + 0.000 000 512;
  • 10) 0.000 000 512 × 2 = 0 + 0.000 001 024;
  • 11) 0.000 001 024 × 2 = 0 + 0.000 002 048;
  • 12) 0.000 002 048 × 2 = 0 + 0.000 004 096;
  • 13) 0.000 004 096 × 2 = 0 + 0.000 008 192;
  • 14) 0.000 008 192 × 2 = 0 + 0.000 016 384;
  • 15) 0.000 016 384 × 2 = 0 + 0.000 032 768;
  • 16) 0.000 032 768 × 2 = 0 + 0.000 065 536;
  • 17) 0.000 065 536 × 2 = 0 + 0.000 131 072;
  • 18) 0.000 131 072 × 2 = 0 + 0.000 262 144;
  • 19) 0.000 262 144 × 2 = 0 + 0.000 524 288;
  • 20) 0.000 524 288 × 2 = 0 + 0.001 048 576;
  • 21) 0.001 048 576 × 2 = 0 + 0.002 097 152;
  • 22) 0.002 097 152 × 2 = 0 + 0.004 194 304;
  • 23) 0.004 194 304 × 2 = 0 + 0.008 388 608;
  • 24) 0.008 388 608 × 2 = 0 + 0.016 777 216;
  • 25) 0.016 777 216 × 2 = 0 + 0.033 554 432;
  • 26) 0.033 554 432 × 2 = 0 + 0.067 108 864;
  • 27) 0.067 108 864 × 2 = 0 + 0.134 217 728;
  • 28) 0.134 217 728 × 2 = 0 + 0.268 435 456;
  • 29) 0.268 435 456 × 2 = 0 + 0.536 870 912;
  • 30) 0.536 870 912 × 2 = 1 + 0.073 741 824;
  • 31) 0.073 741 824 × 2 = 0 + 0.147 483 648;
  • 32) 0.147 483 648 × 2 = 0 + 0.294 967 296;
  • 33) 0.294 967 296 × 2 = 0 + 0.589 934 592;
  • 34) 0.589 934 592 × 2 = 1 + 0.179 869 184;
  • 35) 0.179 869 184 × 2 = 0 + 0.359 738 368;
  • 36) 0.359 738 368 × 2 = 0 + 0.719 476 736;
  • 37) 0.719 476 736 × 2 = 1 + 0.438 953 472;
  • 38) 0.438 953 472 × 2 = 0 + 0.877 906 944;
  • 39) 0.877 906 944 × 2 = 1 + 0.755 813 888;
  • 40) 0.755 813 888 × 2 = 1 + 0.511 627 776;
  • 41) 0.511 627 776 × 2 = 1 + 0.023 255 552;
  • 42) 0.023 255 552 × 2 = 0 + 0.046 511 104;
  • 43) 0.046 511 104 × 2 = 0 + 0.093 022 208;
  • 44) 0.093 022 208 × 2 = 0 + 0.186 044 416;
  • 45) 0.186 044 416 × 2 = 0 + 0.372 088 832;
  • 46) 0.372 088 832 × 2 = 0 + 0.744 177 664;
  • 47) 0.744 177 664 × 2 = 1 + 0.488 355 328;
  • 48) 0.488 355 328 × 2 = 0 + 0.976 710 656;
  • 49) 0.976 710 656 × 2 = 1 + 0.953 421 312;
  • 50) 0.953 421 312 × 2 = 1 + 0.906 842 624;
  • 51) 0.906 842 624 × 2 = 1 + 0.813 685 248;
  • 52) 0.813 685 248 × 2 = 1 + 0.627 370 496;
  • 53) 0.627 370 496 × 2 = 1 + 0.254 740 992;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (losing precision...)


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 001(10) =


0.0000 0000 0000 0000 0000 0000 0000 0100 0100 1011 1000 0010 1111 1(2)


5. Positive number before normalization:

1.000 000 001(10) =


1.0000 0000 0000 0000 0000 0000 0000 0100 0100 1011 1000 0010 1111 1(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 0 positions to the left, so that only one non zero digit remains to the left of it:


1.000 000 001(10) =


1.0000 0000 0000 0000 0000 0000 0000 0100 0100 1011 1000 0010 1111 1(2) =


1.0000 0000 0000 0000 0000 0000 0000 0100 0100 1011 1000 0010 1111 1(2) × 20


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): 0


Mantissa (not normalized):
1.0000 0000 0000 0000 0000 0000 0000 0100 0100 1011 1000 0010 1111 1


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


0 + 2(11-1) - 1 =


(0 + 1 023)(10) =


1 023(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 1 023 ÷ 2 = 511 + 1;
  • 511 ÷ 2 = 255 + 1;
  • 255 ÷ 2 = 127 + 1;
  • 127 ÷ 2 = 63 + 1;
  • 63 ÷ 2 = 31 + 1;
  • 31 ÷ 2 = 15 + 1;
  • 15 ÷ 2 = 7 + 1;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


1023(10) =


011 1111 1111(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, by removing the excess bits, from the right (if any of the excess bits is set on 1, we are losing precision...).


Mantissa (normalized) =


1. 0000 0000 0000 0000 0000 0000 0000 0100 0100 1011 1000 0010 1111 1 =


0000 0000 0000 0000 0000 0000 0000 0100 0100 1011 1000 0010 1111


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1111 1111


Mantissa (52 bits) =
0000 0000 0000 0000 0000 0000 0000 0100 0100 1011 1000 0010 1111


The base ten decimal number 1.000 000 001 converted and written in 64 bit double precision IEEE 754 binary floating point representation:
0 - 011 1111 1111 - 0000 0000 0000 0000 0000 0000 0000 0100 0100 1011 1000 0010 1111

The latest decimal numbers converted from base ten to 64 bit double precision IEEE 754 floating point binary standard representation