64bit IEEE 754: Decimal ↗ Double Precision Floating Point Binary: 0.190 37 Convert the Number to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard, From a Base Ten Decimal System Number

Number 0.190 37(10) converted and written in 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.


0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.190 37.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.190 37 × 2 = 0 + 0.380 74;
  • 2) 0.380 74 × 2 = 0 + 0.761 48;
  • 3) 0.761 48 × 2 = 1 + 0.522 96;
  • 4) 0.522 96 × 2 = 1 + 0.045 92;
  • 5) 0.045 92 × 2 = 0 + 0.091 84;
  • 6) 0.091 84 × 2 = 0 + 0.183 68;
  • 7) 0.183 68 × 2 = 0 + 0.367 36;
  • 8) 0.367 36 × 2 = 0 + 0.734 72;
  • 9) 0.734 72 × 2 = 1 + 0.469 44;
  • 10) 0.469 44 × 2 = 0 + 0.938 88;
  • 11) 0.938 88 × 2 = 1 + 0.877 76;
  • 12) 0.877 76 × 2 = 1 + 0.755 52;
  • 13) 0.755 52 × 2 = 1 + 0.511 04;
  • 14) 0.511 04 × 2 = 1 + 0.022 08;
  • 15) 0.022 08 × 2 = 0 + 0.044 16;
  • 16) 0.044 16 × 2 = 0 + 0.088 32;
  • 17) 0.088 32 × 2 = 0 + 0.176 64;
  • 18) 0.176 64 × 2 = 0 + 0.353 28;
  • 19) 0.353 28 × 2 = 0 + 0.706 56;
  • 20) 0.706 56 × 2 = 1 + 0.413 12;
  • 21) 0.413 12 × 2 = 0 + 0.826 24;
  • 22) 0.826 24 × 2 = 1 + 0.652 48;
  • 23) 0.652 48 × 2 = 1 + 0.304 96;
  • 24) 0.304 96 × 2 = 0 + 0.609 92;
  • 25) 0.609 92 × 2 = 1 + 0.219 84;
  • 26) 0.219 84 × 2 = 0 + 0.439 68;
  • 27) 0.439 68 × 2 = 0 + 0.879 36;
  • 28) 0.879 36 × 2 = 1 + 0.758 72;
  • 29) 0.758 72 × 2 = 1 + 0.517 44;
  • 30) 0.517 44 × 2 = 1 + 0.034 88;
  • 31) 0.034 88 × 2 = 0 + 0.069 76;
  • 32) 0.069 76 × 2 = 0 + 0.139 52;
  • 33) 0.139 52 × 2 = 0 + 0.279 04;
  • 34) 0.279 04 × 2 = 0 + 0.558 08;
  • 35) 0.558 08 × 2 = 1 + 0.116 16;
  • 36) 0.116 16 × 2 = 0 + 0.232 32;
  • 37) 0.232 32 × 2 = 0 + 0.464 64;
  • 38) 0.464 64 × 2 = 0 + 0.929 28;
  • 39) 0.929 28 × 2 = 1 + 0.858 56;
  • 40) 0.858 56 × 2 = 1 + 0.717 12;
  • 41) 0.717 12 × 2 = 1 + 0.434 24;
  • 42) 0.434 24 × 2 = 0 + 0.868 48;
  • 43) 0.868 48 × 2 = 1 + 0.736 96;
  • 44) 0.736 96 × 2 = 1 + 0.473 92;
  • 45) 0.473 92 × 2 = 0 + 0.947 84;
  • 46) 0.947 84 × 2 = 1 + 0.895 68;
  • 47) 0.895 68 × 2 = 1 + 0.791 36;
  • 48) 0.791 36 × 2 = 1 + 0.582 72;
  • 49) 0.582 72 × 2 = 1 + 0.165 44;
  • 50) 0.165 44 × 2 = 0 + 0.330 88;
  • 51) 0.330 88 × 2 = 0 + 0.661 76;
  • 52) 0.661 76 × 2 = 1 + 0.323 52;
  • 53) 0.323 52 × 2 = 0 + 0.647 04;
  • 54) 0.647 04 × 2 = 1 + 0.294 08;
  • 55) 0.294 08 × 2 = 0 + 0.588 16;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (losing precision...)


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.190 37(10) =


0.0011 0000 1011 1100 0001 0110 1001 1100 0010 0011 1011 0111 1001 010(2)


5. Positive number before normalization:

0.190 37(10) =


0.0011 0000 1011 1100 0001 0110 1001 1100 0010 0011 1011 0111 1001 010(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 3 positions to the right, so that only one non zero digit remains to the left of it:


0.190 37(10) =


0.0011 0000 1011 1100 0001 0110 1001 1100 0010 0011 1011 0111 1001 010(2) =


0.0011 0000 1011 1100 0001 0110 1001 1100 0010 0011 1011 0111 1001 010(2) × 20 =


1.1000 0101 1110 0000 1011 0100 1110 0001 0001 1101 1011 1100 1010(2) × 2-3


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -3


Mantissa (not normalized):
1.1000 0101 1110 0000 1011 0100 1110 0001 0001 1101 1011 1100 1010


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-3 + 2(11-1) - 1 =


(-3 + 1 023)(10) =


1 020(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 1 020 ÷ 2 = 510 + 0;
  • 510 ÷ 2 = 255 + 0;
  • 255 ÷ 2 = 127 + 1;
  • 127 ÷ 2 = 63 + 1;
  • 63 ÷ 2 = 31 + 1;
  • 31 ÷ 2 = 15 + 1;
  • 15 ÷ 2 = 7 + 1;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


1020(10) =


011 1111 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 1000 0101 1110 0000 1011 0100 1110 0001 0001 1101 1011 1100 1010 =


1000 0101 1110 0000 1011 0100 1110 0001 0001 1101 1011 1100 1010


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1111 1100


Mantissa (52 bits) =
1000 0101 1110 0000 1011 0100 1110 0001 0001 1101 1011 1100 1010


The base ten decimal number 0.190 37 converted and written in 64 bit double precision IEEE 754 binary floating point representation:
0 - 011 1111 1100 - 1000 0101 1110 0000 1011 0100 1110 0001 0001 1101 1011 1100 1010

The latest decimal numbers converted from base ten to 64 bit double precision IEEE 754 floating point binary standard representation