0.000 000 000 000 000 000 008 03 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 03(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 03(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 03.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 03 × 2 = 0 + 0.000 000 000 000 000 000 016 06;
  • 2) 0.000 000 000 000 000 000 016 06 × 2 = 0 + 0.000 000 000 000 000 000 032 12;
  • 3) 0.000 000 000 000 000 000 032 12 × 2 = 0 + 0.000 000 000 000 000 000 064 24;
  • 4) 0.000 000 000 000 000 000 064 24 × 2 = 0 + 0.000 000 000 000 000 000 128 48;
  • 5) 0.000 000 000 000 000 000 128 48 × 2 = 0 + 0.000 000 000 000 000 000 256 96;
  • 6) 0.000 000 000 000 000 000 256 96 × 2 = 0 + 0.000 000 000 000 000 000 513 92;
  • 7) 0.000 000 000 000 000 000 513 92 × 2 = 0 + 0.000 000 000 000 000 001 027 84;
  • 8) 0.000 000 000 000 000 001 027 84 × 2 = 0 + 0.000 000 000 000 000 002 055 68;
  • 9) 0.000 000 000 000 000 002 055 68 × 2 = 0 + 0.000 000 000 000 000 004 111 36;
  • 10) 0.000 000 000 000 000 004 111 36 × 2 = 0 + 0.000 000 000 000 000 008 222 72;
  • 11) 0.000 000 000 000 000 008 222 72 × 2 = 0 + 0.000 000 000 000 000 016 445 44;
  • 12) 0.000 000 000 000 000 016 445 44 × 2 = 0 + 0.000 000 000 000 000 032 890 88;
  • 13) 0.000 000 000 000 000 032 890 88 × 2 = 0 + 0.000 000 000 000 000 065 781 76;
  • 14) 0.000 000 000 000 000 065 781 76 × 2 = 0 + 0.000 000 000 000 000 131 563 52;
  • 15) 0.000 000 000 000 000 131 563 52 × 2 = 0 + 0.000 000 000 000 000 263 127 04;
  • 16) 0.000 000 000 000 000 263 127 04 × 2 = 0 + 0.000 000 000 000 000 526 254 08;
  • 17) 0.000 000 000 000 000 526 254 08 × 2 = 0 + 0.000 000 000 000 001 052 508 16;
  • 18) 0.000 000 000 000 001 052 508 16 × 2 = 0 + 0.000 000 000 000 002 105 016 32;
  • 19) 0.000 000 000 000 002 105 016 32 × 2 = 0 + 0.000 000 000 000 004 210 032 64;
  • 20) 0.000 000 000 000 004 210 032 64 × 2 = 0 + 0.000 000 000 000 008 420 065 28;
  • 21) 0.000 000 000 000 008 420 065 28 × 2 = 0 + 0.000 000 000 000 016 840 130 56;
  • 22) 0.000 000 000 000 016 840 130 56 × 2 = 0 + 0.000 000 000 000 033 680 261 12;
  • 23) 0.000 000 000 000 033 680 261 12 × 2 = 0 + 0.000 000 000 000 067 360 522 24;
  • 24) 0.000 000 000 000 067 360 522 24 × 2 = 0 + 0.000 000 000 000 134 721 044 48;
  • 25) 0.000 000 000 000 134 721 044 48 × 2 = 0 + 0.000 000 000 000 269 442 088 96;
  • 26) 0.000 000 000 000 269 442 088 96 × 2 = 0 + 0.000 000 000 000 538 884 177 92;
  • 27) 0.000 000 000 000 538 884 177 92 × 2 = 0 + 0.000 000 000 001 077 768 355 84;
  • 28) 0.000 000 000 001 077 768 355 84 × 2 = 0 + 0.000 000 000 002 155 536 711 68;
  • 29) 0.000 000 000 002 155 536 711 68 × 2 = 0 + 0.000 000 000 004 311 073 423 36;
  • 30) 0.000 000 000 004 311 073 423 36 × 2 = 0 + 0.000 000 000 008 622 146 846 72;
  • 31) 0.000 000 000 008 622 146 846 72 × 2 = 0 + 0.000 000 000 017 244 293 693 44;
  • 32) 0.000 000 000 017 244 293 693 44 × 2 = 0 + 0.000 000 000 034 488 587 386 88;
  • 33) 0.000 000 000 034 488 587 386 88 × 2 = 0 + 0.000 000 000 068 977 174 773 76;
  • 34) 0.000 000 000 068 977 174 773 76 × 2 = 0 + 0.000 000 000 137 954 349 547 52;
  • 35) 0.000 000 000 137 954 349 547 52 × 2 = 0 + 0.000 000 000 275 908 699 095 04;
  • 36) 0.000 000 000 275 908 699 095 04 × 2 = 0 + 0.000 000 000 551 817 398 190 08;
  • 37) 0.000 000 000 551 817 398 190 08 × 2 = 0 + 0.000 000 001 103 634 796 380 16;
  • 38) 0.000 000 001 103 634 796 380 16 × 2 = 0 + 0.000 000 002 207 269 592 760 32;
  • 39) 0.000 000 002 207 269 592 760 32 × 2 = 0 + 0.000 000 004 414 539 185 520 64;
  • 40) 0.000 000 004 414 539 185 520 64 × 2 = 0 + 0.000 000 008 829 078 371 041 28;
  • 41) 0.000 000 008 829 078 371 041 28 × 2 = 0 + 0.000 000 017 658 156 742 082 56;
  • 42) 0.000 000 017 658 156 742 082 56 × 2 = 0 + 0.000 000 035 316 313 484 165 12;
  • 43) 0.000 000 035 316 313 484 165 12 × 2 = 0 + 0.000 000 070 632 626 968 330 24;
  • 44) 0.000 000 070 632 626 968 330 24 × 2 = 0 + 0.000 000 141 265 253 936 660 48;
  • 45) 0.000 000 141 265 253 936 660 48 × 2 = 0 + 0.000 000 282 530 507 873 320 96;
  • 46) 0.000 000 282 530 507 873 320 96 × 2 = 0 + 0.000 000 565 061 015 746 641 92;
  • 47) 0.000 000 565 061 015 746 641 92 × 2 = 0 + 0.000 001 130 122 031 493 283 84;
  • 48) 0.000 001 130 122 031 493 283 84 × 2 = 0 + 0.000 002 260 244 062 986 567 68;
  • 49) 0.000 002 260 244 062 986 567 68 × 2 = 0 + 0.000 004 520 488 125 973 135 36;
  • 50) 0.000 004 520 488 125 973 135 36 × 2 = 0 + 0.000 009 040 976 251 946 270 72;
  • 51) 0.000 009 040 976 251 946 270 72 × 2 = 0 + 0.000 018 081 952 503 892 541 44;
  • 52) 0.000 018 081 952 503 892 541 44 × 2 = 0 + 0.000 036 163 905 007 785 082 88;
  • 53) 0.000 036 163 905 007 785 082 88 × 2 = 0 + 0.000 072 327 810 015 570 165 76;
  • 54) 0.000 072 327 810 015 570 165 76 × 2 = 0 + 0.000 144 655 620 031 140 331 52;
  • 55) 0.000 144 655 620 031 140 331 52 × 2 = 0 + 0.000 289 311 240 062 280 663 04;
  • 56) 0.000 289 311 240 062 280 663 04 × 2 = 0 + 0.000 578 622 480 124 561 326 08;
  • 57) 0.000 578 622 480 124 561 326 08 × 2 = 0 + 0.001 157 244 960 249 122 652 16;
  • 58) 0.001 157 244 960 249 122 652 16 × 2 = 0 + 0.002 314 489 920 498 245 304 32;
  • 59) 0.002 314 489 920 498 245 304 32 × 2 = 0 + 0.004 628 979 840 996 490 608 64;
  • 60) 0.004 628 979 840 996 490 608 64 × 2 = 0 + 0.009 257 959 681 992 981 217 28;
  • 61) 0.009 257 959 681 992 981 217 28 × 2 = 0 + 0.018 515 919 363 985 962 434 56;
  • 62) 0.018 515 919 363 985 962 434 56 × 2 = 0 + 0.037 031 838 727 971 924 869 12;
  • 63) 0.037 031 838 727 971 924 869 12 × 2 = 0 + 0.074 063 677 455 943 849 738 24;
  • 64) 0.074 063 677 455 943 849 738 24 × 2 = 0 + 0.148 127 354 911 887 699 476 48;
  • 65) 0.148 127 354 911 887 699 476 48 × 2 = 0 + 0.296 254 709 823 775 398 952 96;
  • 66) 0.296 254 709 823 775 398 952 96 × 2 = 0 + 0.592 509 419 647 550 797 905 92;
  • 67) 0.592 509 419 647 550 797 905 92 × 2 = 1 + 0.185 018 839 295 101 595 811 84;
  • 68) 0.185 018 839 295 101 595 811 84 × 2 = 0 + 0.370 037 678 590 203 191 623 68;
  • 69) 0.370 037 678 590 203 191 623 68 × 2 = 0 + 0.740 075 357 180 406 383 247 36;
  • 70) 0.740 075 357 180 406 383 247 36 × 2 = 1 + 0.480 150 714 360 812 766 494 72;
  • 71) 0.480 150 714 360 812 766 494 72 × 2 = 0 + 0.960 301 428 721 625 532 989 44;
  • 72) 0.960 301 428 721 625 532 989 44 × 2 = 1 + 0.920 602 857 443 251 065 978 88;
  • 73) 0.920 602 857 443 251 065 978 88 × 2 = 1 + 0.841 205 714 886 502 131 957 76;
  • 74) 0.841 205 714 886 502 131 957 76 × 2 = 1 + 0.682 411 429 773 004 263 915 52;
  • 75) 0.682 411 429 773 004 263 915 52 × 2 = 1 + 0.364 822 859 546 008 527 831 04;
  • 76) 0.364 822 859 546 008 527 831 04 × 2 = 0 + 0.729 645 719 092 017 055 662 08;
  • 77) 0.729 645 719 092 017 055 662 08 × 2 = 1 + 0.459 291 438 184 034 111 324 16;
  • 78) 0.459 291 438 184 034 111 324 16 × 2 = 0 + 0.918 582 876 368 068 222 648 32;
  • 79) 0.918 582 876 368 068 222 648 32 × 2 = 1 + 0.837 165 752 736 136 445 296 64;
  • 80) 0.837 165 752 736 136 445 296 64 × 2 = 1 + 0.674 331 505 472 272 890 593 28;
  • 81) 0.674 331 505 472 272 890 593 28 × 2 = 1 + 0.348 663 010 944 545 781 186 56;
  • 82) 0.348 663 010 944 545 781 186 56 × 2 = 0 + 0.697 326 021 889 091 562 373 12;
  • 83) 0.697 326 021 889 091 562 373 12 × 2 = 1 + 0.394 652 043 778 183 124 746 24;
  • 84) 0.394 652 043 778 183 124 746 24 × 2 = 0 + 0.789 304 087 556 366 249 492 48;
  • 85) 0.789 304 087 556 366 249 492 48 × 2 = 1 + 0.578 608 175 112 732 498 984 96;
  • 86) 0.578 608 175 112 732 498 984 96 × 2 = 1 + 0.157 216 350 225 464 997 969 92;
  • 87) 0.157 216 350 225 464 997 969 92 × 2 = 0 + 0.314 432 700 450 929 995 939 84;
  • 88) 0.314 432 700 450 929 995 939 84 × 2 = 0 + 0.628 865 400 901 859 991 879 68;
  • 89) 0.628 865 400 901 859 991 879 68 × 2 = 1 + 0.257 730 801 803 719 983 759 36;
  • 90) 0.257 730 801 803 719 983 759 36 × 2 = 0 + 0.515 461 603 607 439 967 518 72;
  • 91) 0.515 461 603 607 439 967 518 72 × 2 = 1 + 0.030 923 207 214 879 935 037 44;
  • 92) 0.030 923 207 214 879 935 037 44 × 2 = 0 + 0.061 846 414 429 759 870 074 88;
  • 93) 0.061 846 414 429 759 870 074 88 × 2 = 0 + 0.123 692 828 859 519 740 149 76;
  • 94) 0.123 692 828 859 519 740 149 76 × 2 = 0 + 0.247 385 657 719 039 480 299 52;
  • 95) 0.247 385 657 719 039 480 299 52 × 2 = 0 + 0.494 771 315 438 078 960 599 04;
  • 96) 0.494 771 315 438 078 960 599 04 × 2 = 0 + 0.989 542 630 876 157 921 198 08;
  • 97) 0.989 542 630 876 157 921 198 08 × 2 = 1 + 0.979 085 261 752 315 842 396 16;
  • 98) 0.979 085 261 752 315 842 396 16 × 2 = 1 + 0.958 170 523 504 631 684 792 32;
  • 99) 0.958 170 523 504 631 684 792 32 × 2 = 1 + 0.916 341 047 009 263 369 584 64;
  • 100) 0.916 341 047 009 263 369 584 64 × 2 = 1 + 0.832 682 094 018 526 739 169 28;
  • 101) 0.832 682 094 018 526 739 169 28 × 2 = 1 + 0.665 364 188 037 053 478 338 56;
  • 102) 0.665 364 188 037 053 478 338 56 × 2 = 1 + 0.330 728 376 074 106 956 677 12;
  • 103) 0.330 728 376 074 106 956 677 12 × 2 = 0 + 0.661 456 752 148 213 913 354 24;
  • 104) 0.661 456 752 148 213 913 354 24 × 2 = 1 + 0.322 913 504 296 427 826 708 48;
  • 105) 0.322 913 504 296 427 826 708 48 × 2 = 0 + 0.645 827 008 592 855 653 416 96;
  • 106) 0.645 827 008 592 855 653 416 96 × 2 = 1 + 0.291 654 017 185 711 306 833 92;
  • 107) 0.291 654 017 185 711 306 833 92 × 2 = 0 + 0.583 308 034 371 422 613 667 84;
  • 108) 0.583 308 034 371 422 613 667 84 × 2 = 1 + 0.166 616 068 742 845 227 335 68;
  • 109) 0.166 616 068 742 845 227 335 68 × 2 = 0 + 0.333 232 137 485 690 454 671 36;
  • 110) 0.333 232 137 485 690 454 671 36 × 2 = 0 + 0.666 464 274 971 380 909 342 72;
  • 111) 0.666 464 274 971 380 909 342 72 × 2 = 1 + 0.332 928 549 942 761 818 685 44;
  • 112) 0.332 928 549 942 761 818 685 44 × 2 = 0 + 0.665 857 099 885 523 637 370 88;
  • 113) 0.665 857 099 885 523 637 370 88 × 2 = 1 + 0.331 714 199 771 047 274 741 76;
  • 114) 0.331 714 199 771 047 274 741 76 × 2 = 0 + 0.663 428 399 542 094 549 483 52;
  • 115) 0.663 428 399 542 094 549 483 52 × 2 = 1 + 0.326 856 799 084 189 098 967 04;
  • 116) 0.326 856 799 084 189 098 967 04 × 2 = 0 + 0.653 713 598 168 378 197 934 08;
  • 117) 0.653 713 598 168 378 197 934 08 × 2 = 1 + 0.307 427 196 336 756 395 868 16;
  • 118) 0.307 427 196 336 756 395 868 16 × 2 = 0 + 0.614 854 392 673 512 791 736 32;
  • 119) 0.614 854 392 673 512 791 736 32 × 2 = 1 + 0.229 708 785 347 025 583 472 64;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 03(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0101 1110 1011 1010 1100 1010 0000 1111 1101 0101 0010 1010 101(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 03(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0101 1110 1011 1010 1100 1010 0000 1111 1101 0101 0010 1010 101(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 03(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0101 1110 1011 1010 1100 1010 0000 1111 1101 0101 0010 1010 101(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0101 1110 1011 1010 1100 1010 0000 1111 1101 0101 0010 1010 101(2) × 20 =


1.0010 1111 0101 1101 0110 0101 0000 0111 1110 1010 1001 0101 0101(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0010 1111 0101 1101 0110 0101 0000 0111 1110 1010 1001 0101 0101


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0010 1111 0101 1101 0110 0101 0000 0111 1110 1010 1001 0101 0101 =


0010 1111 0101 1101 0110 0101 0000 0111 1110 1010 1001 0101 0101


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0010 1111 0101 1101 0110 0101 0000 0111 1110 1010 1001 0101 0101


Decimal number 0.000 000 000 000 000 000 008 03 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0010 1111 0101 1101 0110 0101 0000 0111 1110 1010 1001 0101 0101


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100