64bit IEEE 754: Decimal ↗ Double Precision Floating Point Binary: 6.285 714 285 714 285 2 Convert the Number to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard, From a Base Ten Decimal System Number

Number 6.285 714 285 714 285 2(10) converted and written in 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 6.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 6 ÷ 2 = 3 + 0;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.


6(10) =


110(2)


3. Convert to binary (base 2) the fractional part: 0.285 714 285 714 285 2.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.285 714 285 714 285 2 × 2 = 0 + 0.571 428 571 428 570 4;
  • 2) 0.571 428 571 428 570 4 × 2 = 1 + 0.142 857 142 857 140 8;
  • 3) 0.142 857 142 857 140 8 × 2 = 0 + 0.285 714 285 714 281 6;
  • 4) 0.285 714 285 714 281 6 × 2 = 0 + 0.571 428 571 428 563 2;
  • 5) 0.571 428 571 428 563 2 × 2 = 1 + 0.142 857 142 857 126 4;
  • 6) 0.142 857 142 857 126 4 × 2 = 0 + 0.285 714 285 714 252 8;
  • 7) 0.285 714 285 714 252 8 × 2 = 0 + 0.571 428 571 428 505 6;
  • 8) 0.571 428 571 428 505 6 × 2 = 1 + 0.142 857 142 857 011 2;
  • 9) 0.142 857 142 857 011 2 × 2 = 0 + 0.285 714 285 714 022 4;
  • 10) 0.285 714 285 714 022 4 × 2 = 0 + 0.571 428 571 428 044 8;
  • 11) 0.571 428 571 428 044 8 × 2 = 1 + 0.142 857 142 856 089 6;
  • 12) 0.142 857 142 856 089 6 × 2 = 0 + 0.285 714 285 712 179 2;
  • 13) 0.285 714 285 712 179 2 × 2 = 0 + 0.571 428 571 424 358 4;
  • 14) 0.571 428 571 424 358 4 × 2 = 1 + 0.142 857 142 848 716 8;
  • 15) 0.142 857 142 848 716 8 × 2 = 0 + 0.285 714 285 697 433 6;
  • 16) 0.285 714 285 697 433 6 × 2 = 0 + 0.571 428 571 394 867 2;
  • 17) 0.571 428 571 394 867 2 × 2 = 1 + 0.142 857 142 789 734 4;
  • 18) 0.142 857 142 789 734 4 × 2 = 0 + 0.285 714 285 579 468 8;
  • 19) 0.285 714 285 579 468 8 × 2 = 0 + 0.571 428 571 158 937 6;
  • 20) 0.571 428 571 158 937 6 × 2 = 1 + 0.142 857 142 317 875 2;
  • 21) 0.142 857 142 317 875 2 × 2 = 0 + 0.285 714 284 635 750 4;
  • 22) 0.285 714 284 635 750 4 × 2 = 0 + 0.571 428 569 271 500 8;
  • 23) 0.571 428 569 271 500 8 × 2 = 1 + 0.142 857 138 543 001 6;
  • 24) 0.142 857 138 543 001 6 × 2 = 0 + 0.285 714 277 086 003 2;
  • 25) 0.285 714 277 086 003 2 × 2 = 0 + 0.571 428 554 172 006 4;
  • 26) 0.571 428 554 172 006 4 × 2 = 1 + 0.142 857 108 344 012 8;
  • 27) 0.142 857 108 344 012 8 × 2 = 0 + 0.285 714 216 688 025 6;
  • 28) 0.285 714 216 688 025 6 × 2 = 0 + 0.571 428 433 376 051 2;
  • 29) 0.571 428 433 376 051 2 × 2 = 1 + 0.142 856 866 752 102 4;
  • 30) 0.142 856 866 752 102 4 × 2 = 0 + 0.285 713 733 504 204 8;
  • 31) 0.285 713 733 504 204 8 × 2 = 0 + 0.571 427 467 008 409 6;
  • 32) 0.571 427 467 008 409 6 × 2 = 1 + 0.142 854 934 016 819 2;
  • 33) 0.142 854 934 016 819 2 × 2 = 0 + 0.285 709 868 033 638 4;
  • 34) 0.285 709 868 033 638 4 × 2 = 0 + 0.571 419 736 067 276 8;
  • 35) 0.571 419 736 067 276 8 × 2 = 1 + 0.142 839 472 134 553 6;
  • 36) 0.142 839 472 134 553 6 × 2 = 0 + 0.285 678 944 269 107 2;
  • 37) 0.285 678 944 269 107 2 × 2 = 0 + 0.571 357 888 538 214 4;
  • 38) 0.571 357 888 538 214 4 × 2 = 1 + 0.142 715 777 076 428 8;
  • 39) 0.142 715 777 076 428 8 × 2 = 0 + 0.285 431 554 152 857 6;
  • 40) 0.285 431 554 152 857 6 × 2 = 0 + 0.570 863 108 305 715 2;
  • 41) 0.570 863 108 305 715 2 × 2 = 1 + 0.141 726 216 611 430 4;
  • 42) 0.141 726 216 611 430 4 × 2 = 0 + 0.283 452 433 222 860 8;
  • 43) 0.283 452 433 222 860 8 × 2 = 0 + 0.566 904 866 445 721 6;
  • 44) 0.566 904 866 445 721 6 × 2 = 1 + 0.133 809 732 891 443 2;
  • 45) 0.133 809 732 891 443 2 × 2 = 0 + 0.267 619 465 782 886 4;
  • 46) 0.267 619 465 782 886 4 × 2 = 0 + 0.535 238 931 565 772 8;
  • 47) 0.535 238 931 565 772 8 × 2 = 1 + 0.070 477 863 131 545 6;
  • 48) 0.070 477 863 131 545 6 × 2 = 0 + 0.140 955 726 263 091 2;
  • 49) 0.140 955 726 263 091 2 × 2 = 0 + 0.281 911 452 526 182 4;
  • 50) 0.281 911 452 526 182 4 × 2 = 0 + 0.563 822 905 052 364 8;
  • 51) 0.563 822 905 052 364 8 × 2 = 1 + 0.127 645 810 104 729 6;
  • 52) 0.127 645 810 104 729 6 × 2 = 0 + 0.255 291 620 209 459 2;
  • 53) 0.255 291 620 209 459 2 × 2 = 0 + 0.510 583 240 418 918 4;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (losing precision...)


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.285 714 285 714 285 2(10) =


0.0100 1001 0010 0100 1001 0010 0100 1001 0010 0100 1001 0010 0010 0(2)


5. Positive number before normalization:

6.285 714 285 714 285 2(10) =


110.0100 1001 0010 0100 1001 0010 0100 1001 0010 0100 1001 0010 0010 0(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 2 positions to the left, so that only one non zero digit remains to the left of it:


6.285 714 285 714 285 2(10) =


110.0100 1001 0010 0100 1001 0010 0100 1001 0010 0100 1001 0010 0010 0(2) =


110.0100 1001 0010 0100 1001 0010 0100 1001 0010 0100 1001 0010 0010 0(2) × 20 =


1.1001 0010 0100 1001 0010 0100 1001 0010 0100 1001 0010 0100 1000 100(2) × 22


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): 2


Mantissa (not normalized):
1.1001 0010 0100 1001 0010 0100 1001 0010 0100 1001 0010 0100 1000 100


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


2 + 2(11-1) - 1 =


(2 + 1 023)(10) =


1 025(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 1 025 ÷ 2 = 512 + 1;
  • 512 ÷ 2 = 256 + 0;
  • 256 ÷ 2 = 128 + 0;
  • 128 ÷ 2 = 64 + 0;
  • 64 ÷ 2 = 32 + 0;
  • 32 ÷ 2 = 16 + 0;
  • 16 ÷ 2 = 8 + 0;
  • 8 ÷ 2 = 4 + 0;
  • 4 ÷ 2 = 2 + 0;
  • 2 ÷ 2 = 1 + 0;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


1025(10) =


100 0000 0001(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, by removing the excess bits, from the right (if any of the excess bits is set on 1, we are losing precision...).


Mantissa (normalized) =


1. 1001 0010 0100 1001 0010 0100 1001 0010 0100 1001 0010 0100 1000 100 =


1001 0010 0100 1001 0010 0100 1001 0010 0100 1001 0010 0100 1000


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
100 0000 0001


Mantissa (52 bits) =
1001 0010 0100 1001 0010 0100 1001 0010 0100 1001 0010 0100 1000


The base ten decimal number 6.285 714 285 714 285 2 converted and written in 64 bit double precision IEEE 754 binary floating point representation:
0 - 100 0000 0001 - 1001 0010 0100 1001 0010 0100 1001 0010 0100 1001 0010 0100 1000

The latest decimal numbers converted from base ten to 64 bit double precision IEEE 754 floating point binary standard representation