0.000 000 000 000 000 000 009 06 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 009 06(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 009 06(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 009 06.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 009 06 × 2 = 0 + 0.000 000 000 000 000 000 018 12;
  • 2) 0.000 000 000 000 000 000 018 12 × 2 = 0 + 0.000 000 000 000 000 000 036 24;
  • 3) 0.000 000 000 000 000 000 036 24 × 2 = 0 + 0.000 000 000 000 000 000 072 48;
  • 4) 0.000 000 000 000 000 000 072 48 × 2 = 0 + 0.000 000 000 000 000 000 144 96;
  • 5) 0.000 000 000 000 000 000 144 96 × 2 = 0 + 0.000 000 000 000 000 000 289 92;
  • 6) 0.000 000 000 000 000 000 289 92 × 2 = 0 + 0.000 000 000 000 000 000 579 84;
  • 7) 0.000 000 000 000 000 000 579 84 × 2 = 0 + 0.000 000 000 000 000 001 159 68;
  • 8) 0.000 000 000 000 000 001 159 68 × 2 = 0 + 0.000 000 000 000 000 002 319 36;
  • 9) 0.000 000 000 000 000 002 319 36 × 2 = 0 + 0.000 000 000 000 000 004 638 72;
  • 10) 0.000 000 000 000 000 004 638 72 × 2 = 0 + 0.000 000 000 000 000 009 277 44;
  • 11) 0.000 000 000 000 000 009 277 44 × 2 = 0 + 0.000 000 000 000 000 018 554 88;
  • 12) 0.000 000 000 000 000 018 554 88 × 2 = 0 + 0.000 000 000 000 000 037 109 76;
  • 13) 0.000 000 000 000 000 037 109 76 × 2 = 0 + 0.000 000 000 000 000 074 219 52;
  • 14) 0.000 000 000 000 000 074 219 52 × 2 = 0 + 0.000 000 000 000 000 148 439 04;
  • 15) 0.000 000 000 000 000 148 439 04 × 2 = 0 + 0.000 000 000 000 000 296 878 08;
  • 16) 0.000 000 000 000 000 296 878 08 × 2 = 0 + 0.000 000 000 000 000 593 756 16;
  • 17) 0.000 000 000 000 000 593 756 16 × 2 = 0 + 0.000 000 000 000 001 187 512 32;
  • 18) 0.000 000 000 000 001 187 512 32 × 2 = 0 + 0.000 000 000 000 002 375 024 64;
  • 19) 0.000 000 000 000 002 375 024 64 × 2 = 0 + 0.000 000 000 000 004 750 049 28;
  • 20) 0.000 000 000 000 004 750 049 28 × 2 = 0 + 0.000 000 000 000 009 500 098 56;
  • 21) 0.000 000 000 000 009 500 098 56 × 2 = 0 + 0.000 000 000 000 019 000 197 12;
  • 22) 0.000 000 000 000 019 000 197 12 × 2 = 0 + 0.000 000 000 000 038 000 394 24;
  • 23) 0.000 000 000 000 038 000 394 24 × 2 = 0 + 0.000 000 000 000 076 000 788 48;
  • 24) 0.000 000 000 000 076 000 788 48 × 2 = 0 + 0.000 000 000 000 152 001 576 96;
  • 25) 0.000 000 000 000 152 001 576 96 × 2 = 0 + 0.000 000 000 000 304 003 153 92;
  • 26) 0.000 000 000 000 304 003 153 92 × 2 = 0 + 0.000 000 000 000 608 006 307 84;
  • 27) 0.000 000 000 000 608 006 307 84 × 2 = 0 + 0.000 000 000 001 216 012 615 68;
  • 28) 0.000 000 000 001 216 012 615 68 × 2 = 0 + 0.000 000 000 002 432 025 231 36;
  • 29) 0.000 000 000 002 432 025 231 36 × 2 = 0 + 0.000 000 000 004 864 050 462 72;
  • 30) 0.000 000 000 004 864 050 462 72 × 2 = 0 + 0.000 000 000 009 728 100 925 44;
  • 31) 0.000 000 000 009 728 100 925 44 × 2 = 0 + 0.000 000 000 019 456 201 850 88;
  • 32) 0.000 000 000 019 456 201 850 88 × 2 = 0 + 0.000 000 000 038 912 403 701 76;
  • 33) 0.000 000 000 038 912 403 701 76 × 2 = 0 + 0.000 000 000 077 824 807 403 52;
  • 34) 0.000 000 000 077 824 807 403 52 × 2 = 0 + 0.000 000 000 155 649 614 807 04;
  • 35) 0.000 000 000 155 649 614 807 04 × 2 = 0 + 0.000 000 000 311 299 229 614 08;
  • 36) 0.000 000 000 311 299 229 614 08 × 2 = 0 + 0.000 000 000 622 598 459 228 16;
  • 37) 0.000 000 000 622 598 459 228 16 × 2 = 0 + 0.000 000 001 245 196 918 456 32;
  • 38) 0.000 000 001 245 196 918 456 32 × 2 = 0 + 0.000 000 002 490 393 836 912 64;
  • 39) 0.000 000 002 490 393 836 912 64 × 2 = 0 + 0.000 000 004 980 787 673 825 28;
  • 40) 0.000 000 004 980 787 673 825 28 × 2 = 0 + 0.000 000 009 961 575 347 650 56;
  • 41) 0.000 000 009 961 575 347 650 56 × 2 = 0 + 0.000 000 019 923 150 695 301 12;
  • 42) 0.000 000 019 923 150 695 301 12 × 2 = 0 + 0.000 000 039 846 301 390 602 24;
  • 43) 0.000 000 039 846 301 390 602 24 × 2 = 0 + 0.000 000 079 692 602 781 204 48;
  • 44) 0.000 000 079 692 602 781 204 48 × 2 = 0 + 0.000 000 159 385 205 562 408 96;
  • 45) 0.000 000 159 385 205 562 408 96 × 2 = 0 + 0.000 000 318 770 411 124 817 92;
  • 46) 0.000 000 318 770 411 124 817 92 × 2 = 0 + 0.000 000 637 540 822 249 635 84;
  • 47) 0.000 000 637 540 822 249 635 84 × 2 = 0 + 0.000 001 275 081 644 499 271 68;
  • 48) 0.000 001 275 081 644 499 271 68 × 2 = 0 + 0.000 002 550 163 288 998 543 36;
  • 49) 0.000 002 550 163 288 998 543 36 × 2 = 0 + 0.000 005 100 326 577 997 086 72;
  • 50) 0.000 005 100 326 577 997 086 72 × 2 = 0 + 0.000 010 200 653 155 994 173 44;
  • 51) 0.000 010 200 653 155 994 173 44 × 2 = 0 + 0.000 020 401 306 311 988 346 88;
  • 52) 0.000 020 401 306 311 988 346 88 × 2 = 0 + 0.000 040 802 612 623 976 693 76;
  • 53) 0.000 040 802 612 623 976 693 76 × 2 = 0 + 0.000 081 605 225 247 953 387 52;
  • 54) 0.000 081 605 225 247 953 387 52 × 2 = 0 + 0.000 163 210 450 495 906 775 04;
  • 55) 0.000 163 210 450 495 906 775 04 × 2 = 0 + 0.000 326 420 900 991 813 550 08;
  • 56) 0.000 326 420 900 991 813 550 08 × 2 = 0 + 0.000 652 841 801 983 627 100 16;
  • 57) 0.000 652 841 801 983 627 100 16 × 2 = 0 + 0.001 305 683 603 967 254 200 32;
  • 58) 0.001 305 683 603 967 254 200 32 × 2 = 0 + 0.002 611 367 207 934 508 400 64;
  • 59) 0.002 611 367 207 934 508 400 64 × 2 = 0 + 0.005 222 734 415 869 016 801 28;
  • 60) 0.005 222 734 415 869 016 801 28 × 2 = 0 + 0.010 445 468 831 738 033 602 56;
  • 61) 0.010 445 468 831 738 033 602 56 × 2 = 0 + 0.020 890 937 663 476 067 205 12;
  • 62) 0.020 890 937 663 476 067 205 12 × 2 = 0 + 0.041 781 875 326 952 134 410 24;
  • 63) 0.041 781 875 326 952 134 410 24 × 2 = 0 + 0.083 563 750 653 904 268 820 48;
  • 64) 0.083 563 750 653 904 268 820 48 × 2 = 0 + 0.167 127 501 307 808 537 640 96;
  • 65) 0.167 127 501 307 808 537 640 96 × 2 = 0 + 0.334 255 002 615 617 075 281 92;
  • 66) 0.334 255 002 615 617 075 281 92 × 2 = 0 + 0.668 510 005 231 234 150 563 84;
  • 67) 0.668 510 005 231 234 150 563 84 × 2 = 1 + 0.337 020 010 462 468 301 127 68;
  • 68) 0.337 020 010 462 468 301 127 68 × 2 = 0 + 0.674 040 020 924 936 602 255 36;
  • 69) 0.674 040 020 924 936 602 255 36 × 2 = 1 + 0.348 080 041 849 873 204 510 72;
  • 70) 0.348 080 041 849 873 204 510 72 × 2 = 0 + 0.696 160 083 699 746 409 021 44;
  • 71) 0.696 160 083 699 746 409 021 44 × 2 = 1 + 0.392 320 167 399 492 818 042 88;
  • 72) 0.392 320 167 399 492 818 042 88 × 2 = 0 + 0.784 640 334 798 985 636 085 76;
  • 73) 0.784 640 334 798 985 636 085 76 × 2 = 1 + 0.569 280 669 597 971 272 171 52;
  • 74) 0.569 280 669 597 971 272 171 52 × 2 = 1 + 0.138 561 339 195 942 544 343 04;
  • 75) 0.138 561 339 195 942 544 343 04 × 2 = 0 + 0.277 122 678 391 885 088 686 08;
  • 76) 0.277 122 678 391 885 088 686 08 × 2 = 0 + 0.554 245 356 783 770 177 372 16;
  • 77) 0.554 245 356 783 770 177 372 16 × 2 = 1 + 0.108 490 713 567 540 354 744 32;
  • 78) 0.108 490 713 567 540 354 744 32 × 2 = 0 + 0.216 981 427 135 080 709 488 64;
  • 79) 0.216 981 427 135 080 709 488 64 × 2 = 0 + 0.433 962 854 270 161 418 977 28;
  • 80) 0.433 962 854 270 161 418 977 28 × 2 = 0 + 0.867 925 708 540 322 837 954 56;
  • 81) 0.867 925 708 540 322 837 954 56 × 2 = 1 + 0.735 851 417 080 645 675 909 12;
  • 82) 0.735 851 417 080 645 675 909 12 × 2 = 1 + 0.471 702 834 161 291 351 818 24;
  • 83) 0.471 702 834 161 291 351 818 24 × 2 = 0 + 0.943 405 668 322 582 703 636 48;
  • 84) 0.943 405 668 322 582 703 636 48 × 2 = 1 + 0.886 811 336 645 165 407 272 96;
  • 85) 0.886 811 336 645 165 407 272 96 × 2 = 1 + 0.773 622 673 290 330 814 545 92;
  • 86) 0.773 622 673 290 330 814 545 92 × 2 = 1 + 0.547 245 346 580 661 629 091 84;
  • 87) 0.547 245 346 580 661 629 091 84 × 2 = 1 + 0.094 490 693 161 323 258 183 68;
  • 88) 0.094 490 693 161 323 258 183 68 × 2 = 0 + 0.188 981 386 322 646 516 367 36;
  • 89) 0.188 981 386 322 646 516 367 36 × 2 = 0 + 0.377 962 772 645 293 032 734 72;
  • 90) 0.377 962 772 645 293 032 734 72 × 2 = 0 + 0.755 925 545 290 586 065 469 44;
  • 91) 0.755 925 545 290 586 065 469 44 × 2 = 1 + 0.511 851 090 581 172 130 938 88;
  • 92) 0.511 851 090 581 172 130 938 88 × 2 = 1 + 0.023 702 181 162 344 261 877 76;
  • 93) 0.023 702 181 162 344 261 877 76 × 2 = 0 + 0.047 404 362 324 688 523 755 52;
  • 94) 0.047 404 362 324 688 523 755 52 × 2 = 0 + 0.094 808 724 649 377 047 511 04;
  • 95) 0.094 808 724 649 377 047 511 04 × 2 = 0 + 0.189 617 449 298 754 095 022 08;
  • 96) 0.189 617 449 298 754 095 022 08 × 2 = 0 + 0.379 234 898 597 508 190 044 16;
  • 97) 0.379 234 898 597 508 190 044 16 × 2 = 0 + 0.758 469 797 195 016 380 088 32;
  • 98) 0.758 469 797 195 016 380 088 32 × 2 = 1 + 0.516 939 594 390 032 760 176 64;
  • 99) 0.516 939 594 390 032 760 176 64 × 2 = 1 + 0.033 879 188 780 065 520 353 28;
  • 100) 0.033 879 188 780 065 520 353 28 × 2 = 0 + 0.067 758 377 560 131 040 706 56;
  • 101) 0.067 758 377 560 131 040 706 56 × 2 = 0 + 0.135 516 755 120 262 081 413 12;
  • 102) 0.135 516 755 120 262 081 413 12 × 2 = 0 + 0.271 033 510 240 524 162 826 24;
  • 103) 0.271 033 510 240 524 162 826 24 × 2 = 0 + 0.542 067 020 481 048 325 652 48;
  • 104) 0.542 067 020 481 048 325 652 48 × 2 = 1 + 0.084 134 040 962 096 651 304 96;
  • 105) 0.084 134 040 962 096 651 304 96 × 2 = 0 + 0.168 268 081 924 193 302 609 92;
  • 106) 0.168 268 081 924 193 302 609 92 × 2 = 0 + 0.336 536 163 848 386 605 219 84;
  • 107) 0.336 536 163 848 386 605 219 84 × 2 = 0 + 0.673 072 327 696 773 210 439 68;
  • 108) 0.673 072 327 696 773 210 439 68 × 2 = 1 + 0.346 144 655 393 546 420 879 36;
  • 109) 0.346 144 655 393 546 420 879 36 × 2 = 0 + 0.692 289 310 787 092 841 758 72;
  • 110) 0.692 289 310 787 092 841 758 72 × 2 = 1 + 0.384 578 621 574 185 683 517 44;
  • 111) 0.384 578 621 574 185 683 517 44 × 2 = 0 + 0.769 157 243 148 371 367 034 88;
  • 112) 0.769 157 243 148 371 367 034 88 × 2 = 1 + 0.538 314 486 296 742 734 069 76;
  • 113) 0.538 314 486 296 742 734 069 76 × 2 = 1 + 0.076 628 972 593 485 468 139 52;
  • 114) 0.076 628 972 593 485 468 139 52 × 2 = 0 + 0.153 257 945 186 970 936 279 04;
  • 115) 0.153 257 945 186 970 936 279 04 × 2 = 0 + 0.306 515 890 373 941 872 558 08;
  • 116) 0.306 515 890 373 941 872 558 08 × 2 = 0 + 0.613 031 780 747 883 745 116 16;
  • 117) 0.613 031 780 747 883 745 116 16 × 2 = 1 + 0.226 063 561 495 767 490 232 32;
  • 118) 0.226 063 561 495 767 490 232 32 × 2 = 0 + 0.452 127 122 991 534 980 464 64;
  • 119) 0.452 127 122 991 534 980 464 64 × 2 = 0 + 0.904 254 245 983 069 960 929 28;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 009 06(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1010 1100 1000 1101 1110 0011 0000 0110 0001 0001 0101 1000 100(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 009 06(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1010 1100 1000 1101 1110 0011 0000 0110 0001 0001 0101 1000 100(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 009 06(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1010 1100 1000 1101 1110 0011 0000 0110 0001 0001 0101 1000 100(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1010 1100 1000 1101 1110 0011 0000 0110 0001 0001 0101 1000 100(2) × 20 =


1.0101 0110 0100 0110 1111 0001 1000 0011 0000 1000 1010 1100 0100(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0101 0110 0100 0110 1111 0001 1000 0011 0000 1000 1010 1100 0100


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0101 0110 0100 0110 1111 0001 1000 0011 0000 1000 1010 1100 0100 =


0101 0110 0100 0110 1111 0001 1000 0011 0000 1000 1010 1100 0100


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0101 0110 0100 0110 1111 0001 1000 0011 0000 1000 1010 1100 0100


Decimal number 0.000 000 000 000 000 000 009 06 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0101 0110 0100 0110 1111 0001 1000 0011 0000 1000 1010 1100 0100


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100