64bit IEEE 754: Decimal ↗ Double Precision Floating Point Binary: 0.234 38 Convert the Number to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard, From a Base Ten Decimal System Number

Number 0.234 38(10) converted and written in 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.


0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.234 38.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.234 38 × 2 = 0 + 0.468 76;
  • 2) 0.468 76 × 2 = 0 + 0.937 52;
  • 3) 0.937 52 × 2 = 1 + 0.875 04;
  • 4) 0.875 04 × 2 = 1 + 0.750 08;
  • 5) 0.750 08 × 2 = 1 + 0.500 16;
  • 6) 0.500 16 × 2 = 1 + 0.000 32;
  • 7) 0.000 32 × 2 = 0 + 0.000 64;
  • 8) 0.000 64 × 2 = 0 + 0.001 28;
  • 9) 0.001 28 × 2 = 0 + 0.002 56;
  • 10) 0.002 56 × 2 = 0 + 0.005 12;
  • 11) 0.005 12 × 2 = 0 + 0.010 24;
  • 12) 0.010 24 × 2 = 0 + 0.020 48;
  • 13) 0.020 48 × 2 = 0 + 0.040 96;
  • 14) 0.040 96 × 2 = 0 + 0.081 92;
  • 15) 0.081 92 × 2 = 0 + 0.163 84;
  • 16) 0.163 84 × 2 = 0 + 0.327 68;
  • 17) 0.327 68 × 2 = 0 + 0.655 36;
  • 18) 0.655 36 × 2 = 1 + 0.310 72;
  • 19) 0.310 72 × 2 = 0 + 0.621 44;
  • 20) 0.621 44 × 2 = 1 + 0.242 88;
  • 21) 0.242 88 × 2 = 0 + 0.485 76;
  • 22) 0.485 76 × 2 = 0 + 0.971 52;
  • 23) 0.971 52 × 2 = 1 + 0.943 04;
  • 24) 0.943 04 × 2 = 1 + 0.886 08;
  • 25) 0.886 08 × 2 = 1 + 0.772 16;
  • 26) 0.772 16 × 2 = 1 + 0.544 32;
  • 27) 0.544 32 × 2 = 1 + 0.088 64;
  • 28) 0.088 64 × 2 = 0 + 0.177 28;
  • 29) 0.177 28 × 2 = 0 + 0.354 56;
  • 30) 0.354 56 × 2 = 0 + 0.709 12;
  • 31) 0.709 12 × 2 = 1 + 0.418 24;
  • 32) 0.418 24 × 2 = 0 + 0.836 48;
  • 33) 0.836 48 × 2 = 1 + 0.672 96;
  • 34) 0.672 96 × 2 = 1 + 0.345 92;
  • 35) 0.345 92 × 2 = 0 + 0.691 84;
  • 36) 0.691 84 × 2 = 1 + 0.383 68;
  • 37) 0.383 68 × 2 = 0 + 0.767 36;
  • 38) 0.767 36 × 2 = 1 + 0.534 72;
  • 39) 0.534 72 × 2 = 1 + 0.069 44;
  • 40) 0.069 44 × 2 = 0 + 0.138 88;
  • 41) 0.138 88 × 2 = 0 + 0.277 76;
  • 42) 0.277 76 × 2 = 0 + 0.555 52;
  • 43) 0.555 52 × 2 = 1 + 0.111 04;
  • 44) 0.111 04 × 2 = 0 + 0.222 08;
  • 45) 0.222 08 × 2 = 0 + 0.444 16;
  • 46) 0.444 16 × 2 = 0 + 0.888 32;
  • 47) 0.888 32 × 2 = 1 + 0.776 64;
  • 48) 0.776 64 × 2 = 1 + 0.553 28;
  • 49) 0.553 28 × 2 = 1 + 0.106 56;
  • 50) 0.106 56 × 2 = 0 + 0.213 12;
  • 51) 0.213 12 × 2 = 0 + 0.426 24;
  • 52) 0.426 24 × 2 = 0 + 0.852 48;
  • 53) 0.852 48 × 2 = 1 + 0.704 96;
  • 54) 0.704 96 × 2 = 1 + 0.409 92;
  • 55) 0.409 92 × 2 = 0 + 0.819 84;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (losing precision...)


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.234 38(10) =


0.0011 1100 0000 0000 0101 0011 1110 0010 1101 0110 0010 0011 1000 110(2)


5. Positive number before normalization:

0.234 38(10) =


0.0011 1100 0000 0000 0101 0011 1110 0010 1101 0110 0010 0011 1000 110(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 3 positions to the right, so that only one non zero digit remains to the left of it:


0.234 38(10) =


0.0011 1100 0000 0000 0101 0011 1110 0010 1101 0110 0010 0011 1000 110(2) =


0.0011 1100 0000 0000 0101 0011 1110 0010 1101 0110 0010 0011 1000 110(2) × 20 =


1.1110 0000 0000 0010 1001 1111 0001 0110 1011 0001 0001 1100 0110(2) × 2-3


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -3


Mantissa (not normalized):
1.1110 0000 0000 0010 1001 1111 0001 0110 1011 0001 0001 1100 0110


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-3 + 2(11-1) - 1 =


(-3 + 1 023)(10) =


1 020(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 1 020 ÷ 2 = 510 + 0;
  • 510 ÷ 2 = 255 + 0;
  • 255 ÷ 2 = 127 + 1;
  • 127 ÷ 2 = 63 + 1;
  • 63 ÷ 2 = 31 + 1;
  • 31 ÷ 2 = 15 + 1;
  • 15 ÷ 2 = 7 + 1;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


1020(10) =


011 1111 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 1110 0000 0000 0010 1001 1111 0001 0110 1011 0001 0001 1100 0110 =


1110 0000 0000 0010 1001 1111 0001 0110 1011 0001 0001 1100 0110


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1111 1100


Mantissa (52 bits) =
1110 0000 0000 0010 1001 1111 0001 0110 1011 0001 0001 1100 0110


The base ten decimal number 0.234 38 converted and written in 64 bit double precision IEEE 754 binary floating point representation:
0 - 011 1111 1100 - 1110 0000 0000 0010 1001 1111 0001 0110 1011 0001 0001 1100 0110

The latest decimal numbers converted from base ten to 64 bit double precision IEEE 754 floating point binary standard representation