0.000 000 000 000 000 000 008 585 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 585(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 585(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 585.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 585 × 2 = 0 + 0.000 000 000 000 000 000 017 17;
  • 2) 0.000 000 000 000 000 000 017 17 × 2 = 0 + 0.000 000 000 000 000 000 034 34;
  • 3) 0.000 000 000 000 000 000 034 34 × 2 = 0 + 0.000 000 000 000 000 000 068 68;
  • 4) 0.000 000 000 000 000 000 068 68 × 2 = 0 + 0.000 000 000 000 000 000 137 36;
  • 5) 0.000 000 000 000 000 000 137 36 × 2 = 0 + 0.000 000 000 000 000 000 274 72;
  • 6) 0.000 000 000 000 000 000 274 72 × 2 = 0 + 0.000 000 000 000 000 000 549 44;
  • 7) 0.000 000 000 000 000 000 549 44 × 2 = 0 + 0.000 000 000 000 000 001 098 88;
  • 8) 0.000 000 000 000 000 001 098 88 × 2 = 0 + 0.000 000 000 000 000 002 197 76;
  • 9) 0.000 000 000 000 000 002 197 76 × 2 = 0 + 0.000 000 000 000 000 004 395 52;
  • 10) 0.000 000 000 000 000 004 395 52 × 2 = 0 + 0.000 000 000 000 000 008 791 04;
  • 11) 0.000 000 000 000 000 008 791 04 × 2 = 0 + 0.000 000 000 000 000 017 582 08;
  • 12) 0.000 000 000 000 000 017 582 08 × 2 = 0 + 0.000 000 000 000 000 035 164 16;
  • 13) 0.000 000 000 000 000 035 164 16 × 2 = 0 + 0.000 000 000 000 000 070 328 32;
  • 14) 0.000 000 000 000 000 070 328 32 × 2 = 0 + 0.000 000 000 000 000 140 656 64;
  • 15) 0.000 000 000 000 000 140 656 64 × 2 = 0 + 0.000 000 000 000 000 281 313 28;
  • 16) 0.000 000 000 000 000 281 313 28 × 2 = 0 + 0.000 000 000 000 000 562 626 56;
  • 17) 0.000 000 000 000 000 562 626 56 × 2 = 0 + 0.000 000 000 000 001 125 253 12;
  • 18) 0.000 000 000 000 001 125 253 12 × 2 = 0 + 0.000 000 000 000 002 250 506 24;
  • 19) 0.000 000 000 000 002 250 506 24 × 2 = 0 + 0.000 000 000 000 004 501 012 48;
  • 20) 0.000 000 000 000 004 501 012 48 × 2 = 0 + 0.000 000 000 000 009 002 024 96;
  • 21) 0.000 000 000 000 009 002 024 96 × 2 = 0 + 0.000 000 000 000 018 004 049 92;
  • 22) 0.000 000 000 000 018 004 049 92 × 2 = 0 + 0.000 000 000 000 036 008 099 84;
  • 23) 0.000 000 000 000 036 008 099 84 × 2 = 0 + 0.000 000 000 000 072 016 199 68;
  • 24) 0.000 000 000 000 072 016 199 68 × 2 = 0 + 0.000 000 000 000 144 032 399 36;
  • 25) 0.000 000 000 000 144 032 399 36 × 2 = 0 + 0.000 000 000 000 288 064 798 72;
  • 26) 0.000 000 000 000 288 064 798 72 × 2 = 0 + 0.000 000 000 000 576 129 597 44;
  • 27) 0.000 000 000 000 576 129 597 44 × 2 = 0 + 0.000 000 000 001 152 259 194 88;
  • 28) 0.000 000 000 001 152 259 194 88 × 2 = 0 + 0.000 000 000 002 304 518 389 76;
  • 29) 0.000 000 000 002 304 518 389 76 × 2 = 0 + 0.000 000 000 004 609 036 779 52;
  • 30) 0.000 000 000 004 609 036 779 52 × 2 = 0 + 0.000 000 000 009 218 073 559 04;
  • 31) 0.000 000 000 009 218 073 559 04 × 2 = 0 + 0.000 000 000 018 436 147 118 08;
  • 32) 0.000 000 000 018 436 147 118 08 × 2 = 0 + 0.000 000 000 036 872 294 236 16;
  • 33) 0.000 000 000 036 872 294 236 16 × 2 = 0 + 0.000 000 000 073 744 588 472 32;
  • 34) 0.000 000 000 073 744 588 472 32 × 2 = 0 + 0.000 000 000 147 489 176 944 64;
  • 35) 0.000 000 000 147 489 176 944 64 × 2 = 0 + 0.000 000 000 294 978 353 889 28;
  • 36) 0.000 000 000 294 978 353 889 28 × 2 = 0 + 0.000 000 000 589 956 707 778 56;
  • 37) 0.000 000 000 589 956 707 778 56 × 2 = 0 + 0.000 000 001 179 913 415 557 12;
  • 38) 0.000 000 001 179 913 415 557 12 × 2 = 0 + 0.000 000 002 359 826 831 114 24;
  • 39) 0.000 000 002 359 826 831 114 24 × 2 = 0 + 0.000 000 004 719 653 662 228 48;
  • 40) 0.000 000 004 719 653 662 228 48 × 2 = 0 + 0.000 000 009 439 307 324 456 96;
  • 41) 0.000 000 009 439 307 324 456 96 × 2 = 0 + 0.000 000 018 878 614 648 913 92;
  • 42) 0.000 000 018 878 614 648 913 92 × 2 = 0 + 0.000 000 037 757 229 297 827 84;
  • 43) 0.000 000 037 757 229 297 827 84 × 2 = 0 + 0.000 000 075 514 458 595 655 68;
  • 44) 0.000 000 075 514 458 595 655 68 × 2 = 0 + 0.000 000 151 028 917 191 311 36;
  • 45) 0.000 000 151 028 917 191 311 36 × 2 = 0 + 0.000 000 302 057 834 382 622 72;
  • 46) 0.000 000 302 057 834 382 622 72 × 2 = 0 + 0.000 000 604 115 668 765 245 44;
  • 47) 0.000 000 604 115 668 765 245 44 × 2 = 0 + 0.000 001 208 231 337 530 490 88;
  • 48) 0.000 001 208 231 337 530 490 88 × 2 = 0 + 0.000 002 416 462 675 060 981 76;
  • 49) 0.000 002 416 462 675 060 981 76 × 2 = 0 + 0.000 004 832 925 350 121 963 52;
  • 50) 0.000 004 832 925 350 121 963 52 × 2 = 0 + 0.000 009 665 850 700 243 927 04;
  • 51) 0.000 009 665 850 700 243 927 04 × 2 = 0 + 0.000 019 331 701 400 487 854 08;
  • 52) 0.000 019 331 701 400 487 854 08 × 2 = 0 + 0.000 038 663 402 800 975 708 16;
  • 53) 0.000 038 663 402 800 975 708 16 × 2 = 0 + 0.000 077 326 805 601 951 416 32;
  • 54) 0.000 077 326 805 601 951 416 32 × 2 = 0 + 0.000 154 653 611 203 902 832 64;
  • 55) 0.000 154 653 611 203 902 832 64 × 2 = 0 + 0.000 309 307 222 407 805 665 28;
  • 56) 0.000 309 307 222 407 805 665 28 × 2 = 0 + 0.000 618 614 444 815 611 330 56;
  • 57) 0.000 618 614 444 815 611 330 56 × 2 = 0 + 0.001 237 228 889 631 222 661 12;
  • 58) 0.001 237 228 889 631 222 661 12 × 2 = 0 + 0.002 474 457 779 262 445 322 24;
  • 59) 0.002 474 457 779 262 445 322 24 × 2 = 0 + 0.004 948 915 558 524 890 644 48;
  • 60) 0.004 948 915 558 524 890 644 48 × 2 = 0 + 0.009 897 831 117 049 781 288 96;
  • 61) 0.009 897 831 117 049 781 288 96 × 2 = 0 + 0.019 795 662 234 099 562 577 92;
  • 62) 0.019 795 662 234 099 562 577 92 × 2 = 0 + 0.039 591 324 468 199 125 155 84;
  • 63) 0.039 591 324 468 199 125 155 84 × 2 = 0 + 0.079 182 648 936 398 250 311 68;
  • 64) 0.079 182 648 936 398 250 311 68 × 2 = 0 + 0.158 365 297 872 796 500 623 36;
  • 65) 0.158 365 297 872 796 500 623 36 × 2 = 0 + 0.316 730 595 745 593 001 246 72;
  • 66) 0.316 730 595 745 593 001 246 72 × 2 = 0 + 0.633 461 191 491 186 002 493 44;
  • 67) 0.633 461 191 491 186 002 493 44 × 2 = 1 + 0.266 922 382 982 372 004 986 88;
  • 68) 0.266 922 382 982 372 004 986 88 × 2 = 0 + 0.533 844 765 964 744 009 973 76;
  • 69) 0.533 844 765 964 744 009 973 76 × 2 = 1 + 0.067 689 531 929 488 019 947 52;
  • 70) 0.067 689 531 929 488 019 947 52 × 2 = 0 + 0.135 379 063 858 976 039 895 04;
  • 71) 0.135 379 063 858 976 039 895 04 × 2 = 0 + 0.270 758 127 717 952 079 790 08;
  • 72) 0.270 758 127 717 952 079 790 08 × 2 = 0 + 0.541 516 255 435 904 159 580 16;
  • 73) 0.541 516 255 435 904 159 580 16 × 2 = 1 + 0.083 032 510 871 808 319 160 32;
  • 74) 0.083 032 510 871 808 319 160 32 × 2 = 0 + 0.166 065 021 743 616 638 320 64;
  • 75) 0.166 065 021 743 616 638 320 64 × 2 = 0 + 0.332 130 043 487 233 276 641 28;
  • 76) 0.332 130 043 487 233 276 641 28 × 2 = 0 + 0.664 260 086 974 466 553 282 56;
  • 77) 0.664 260 086 974 466 553 282 56 × 2 = 1 + 0.328 520 173 948 933 106 565 12;
  • 78) 0.328 520 173 948 933 106 565 12 × 2 = 0 + 0.657 040 347 897 866 213 130 24;
  • 79) 0.657 040 347 897 866 213 130 24 × 2 = 1 + 0.314 080 695 795 732 426 260 48;
  • 80) 0.314 080 695 795 732 426 260 48 × 2 = 0 + 0.628 161 391 591 464 852 520 96;
  • 81) 0.628 161 391 591 464 852 520 96 × 2 = 1 + 0.256 322 783 182 929 705 041 92;
  • 82) 0.256 322 783 182 929 705 041 92 × 2 = 0 + 0.512 645 566 365 859 410 083 84;
  • 83) 0.512 645 566 365 859 410 083 84 × 2 = 1 + 0.025 291 132 731 718 820 167 68;
  • 84) 0.025 291 132 731 718 820 167 68 × 2 = 0 + 0.050 582 265 463 437 640 335 36;
  • 85) 0.050 582 265 463 437 640 335 36 × 2 = 0 + 0.101 164 530 926 875 280 670 72;
  • 86) 0.101 164 530 926 875 280 670 72 × 2 = 0 + 0.202 329 061 853 750 561 341 44;
  • 87) 0.202 329 061 853 750 561 341 44 × 2 = 0 + 0.404 658 123 707 501 122 682 88;
  • 88) 0.404 658 123 707 501 122 682 88 × 2 = 0 + 0.809 316 247 415 002 245 365 76;
  • 89) 0.809 316 247 415 002 245 365 76 × 2 = 1 + 0.618 632 494 830 004 490 731 52;
  • 90) 0.618 632 494 830 004 490 731 52 × 2 = 1 + 0.237 264 989 660 008 981 463 04;
  • 91) 0.237 264 989 660 008 981 463 04 × 2 = 0 + 0.474 529 979 320 017 962 926 08;
  • 92) 0.474 529 979 320 017 962 926 08 × 2 = 0 + 0.949 059 958 640 035 925 852 16;
  • 93) 0.949 059 958 640 035 925 852 16 × 2 = 1 + 0.898 119 917 280 071 851 704 32;
  • 94) 0.898 119 917 280 071 851 704 32 × 2 = 1 + 0.796 239 834 560 143 703 408 64;
  • 95) 0.796 239 834 560 143 703 408 64 × 2 = 1 + 0.592 479 669 120 287 406 817 28;
  • 96) 0.592 479 669 120 287 406 817 28 × 2 = 1 + 0.184 959 338 240 574 813 634 56;
  • 97) 0.184 959 338 240 574 813 634 56 × 2 = 0 + 0.369 918 676 481 149 627 269 12;
  • 98) 0.369 918 676 481 149 627 269 12 × 2 = 0 + 0.739 837 352 962 299 254 538 24;
  • 99) 0.739 837 352 962 299 254 538 24 × 2 = 1 + 0.479 674 705 924 598 509 076 48;
  • 100) 0.479 674 705 924 598 509 076 48 × 2 = 0 + 0.959 349 411 849 197 018 152 96;
  • 101) 0.959 349 411 849 197 018 152 96 × 2 = 1 + 0.918 698 823 698 394 036 305 92;
  • 102) 0.918 698 823 698 394 036 305 92 × 2 = 1 + 0.837 397 647 396 788 072 611 84;
  • 103) 0.837 397 647 396 788 072 611 84 × 2 = 1 + 0.674 795 294 793 576 145 223 68;
  • 104) 0.674 795 294 793 576 145 223 68 × 2 = 1 + 0.349 590 589 587 152 290 447 36;
  • 105) 0.349 590 589 587 152 290 447 36 × 2 = 0 + 0.699 181 179 174 304 580 894 72;
  • 106) 0.699 181 179 174 304 580 894 72 × 2 = 1 + 0.398 362 358 348 609 161 789 44;
  • 107) 0.398 362 358 348 609 161 789 44 × 2 = 0 + 0.796 724 716 697 218 323 578 88;
  • 108) 0.796 724 716 697 218 323 578 88 × 2 = 1 + 0.593 449 433 394 436 647 157 76;
  • 109) 0.593 449 433 394 436 647 157 76 × 2 = 1 + 0.186 898 866 788 873 294 315 52;
  • 110) 0.186 898 866 788 873 294 315 52 × 2 = 0 + 0.373 797 733 577 746 588 631 04;
  • 111) 0.373 797 733 577 746 588 631 04 × 2 = 0 + 0.747 595 467 155 493 177 262 08;
  • 112) 0.747 595 467 155 493 177 262 08 × 2 = 1 + 0.495 190 934 310 986 354 524 16;
  • 113) 0.495 190 934 310 986 354 524 16 × 2 = 0 + 0.990 381 868 621 972 709 048 32;
  • 114) 0.990 381 868 621 972 709 048 32 × 2 = 1 + 0.980 763 737 243 945 418 096 64;
  • 115) 0.980 763 737 243 945 418 096 64 × 2 = 1 + 0.961 527 474 487 890 836 193 28;
  • 116) 0.961 527 474 487 890 836 193 28 × 2 = 1 + 0.923 054 948 975 781 672 386 56;
  • 117) 0.923 054 948 975 781 672 386 56 × 2 = 1 + 0.846 109 897 951 563 344 773 12;
  • 118) 0.846 109 897 951 563 344 773 12 × 2 = 1 + 0.692 219 795 903 126 689 546 24;
  • 119) 0.692 219 795 903 126 689 546 24 × 2 = 1 + 0.384 439 591 806 253 379 092 48;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 585(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 1000 1010 1010 0000 1100 1111 0010 1111 0101 1001 0111 111(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 585(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 1000 1010 1010 0000 1100 1111 0010 1111 0101 1001 0111 111(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 585(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 1000 1010 1010 0000 1100 1111 0010 1111 0101 1001 0111 111(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 1000 1010 1010 0000 1100 1111 0010 1111 0101 1001 0111 111(2) × 20 =


1.0100 0100 0101 0101 0000 0110 0111 1001 0111 1010 1100 1011 1111(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0100 0101 0101 0000 0110 0111 1001 0111 1010 1100 1011 1111


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0100 0101 0101 0000 0110 0111 1001 0111 1010 1100 1011 1111 =


0100 0100 0101 0101 0000 0110 0111 1001 0111 1010 1100 1011 1111


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0100 0101 0101 0000 0110 0111 1001 0111 1010 1100 1011 1111


Decimal number 0.000 000 000 000 000 000 008 585 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0100 0101 0101 0000 0110 0111 1001 0111 1010 1100 1011 1111


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100