0.000 000 000 000 000 000 008 584 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 584(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 584(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 584.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 584 × 2 = 0 + 0.000 000 000 000 000 000 017 168;
  • 2) 0.000 000 000 000 000 000 017 168 × 2 = 0 + 0.000 000 000 000 000 000 034 336;
  • 3) 0.000 000 000 000 000 000 034 336 × 2 = 0 + 0.000 000 000 000 000 000 068 672;
  • 4) 0.000 000 000 000 000 000 068 672 × 2 = 0 + 0.000 000 000 000 000 000 137 344;
  • 5) 0.000 000 000 000 000 000 137 344 × 2 = 0 + 0.000 000 000 000 000 000 274 688;
  • 6) 0.000 000 000 000 000 000 274 688 × 2 = 0 + 0.000 000 000 000 000 000 549 376;
  • 7) 0.000 000 000 000 000 000 549 376 × 2 = 0 + 0.000 000 000 000 000 001 098 752;
  • 8) 0.000 000 000 000 000 001 098 752 × 2 = 0 + 0.000 000 000 000 000 002 197 504;
  • 9) 0.000 000 000 000 000 002 197 504 × 2 = 0 + 0.000 000 000 000 000 004 395 008;
  • 10) 0.000 000 000 000 000 004 395 008 × 2 = 0 + 0.000 000 000 000 000 008 790 016;
  • 11) 0.000 000 000 000 000 008 790 016 × 2 = 0 + 0.000 000 000 000 000 017 580 032;
  • 12) 0.000 000 000 000 000 017 580 032 × 2 = 0 + 0.000 000 000 000 000 035 160 064;
  • 13) 0.000 000 000 000 000 035 160 064 × 2 = 0 + 0.000 000 000 000 000 070 320 128;
  • 14) 0.000 000 000 000 000 070 320 128 × 2 = 0 + 0.000 000 000 000 000 140 640 256;
  • 15) 0.000 000 000 000 000 140 640 256 × 2 = 0 + 0.000 000 000 000 000 281 280 512;
  • 16) 0.000 000 000 000 000 281 280 512 × 2 = 0 + 0.000 000 000 000 000 562 561 024;
  • 17) 0.000 000 000 000 000 562 561 024 × 2 = 0 + 0.000 000 000 000 001 125 122 048;
  • 18) 0.000 000 000 000 001 125 122 048 × 2 = 0 + 0.000 000 000 000 002 250 244 096;
  • 19) 0.000 000 000 000 002 250 244 096 × 2 = 0 + 0.000 000 000 000 004 500 488 192;
  • 20) 0.000 000 000 000 004 500 488 192 × 2 = 0 + 0.000 000 000 000 009 000 976 384;
  • 21) 0.000 000 000 000 009 000 976 384 × 2 = 0 + 0.000 000 000 000 018 001 952 768;
  • 22) 0.000 000 000 000 018 001 952 768 × 2 = 0 + 0.000 000 000 000 036 003 905 536;
  • 23) 0.000 000 000 000 036 003 905 536 × 2 = 0 + 0.000 000 000 000 072 007 811 072;
  • 24) 0.000 000 000 000 072 007 811 072 × 2 = 0 + 0.000 000 000 000 144 015 622 144;
  • 25) 0.000 000 000 000 144 015 622 144 × 2 = 0 + 0.000 000 000 000 288 031 244 288;
  • 26) 0.000 000 000 000 288 031 244 288 × 2 = 0 + 0.000 000 000 000 576 062 488 576;
  • 27) 0.000 000 000 000 576 062 488 576 × 2 = 0 + 0.000 000 000 001 152 124 977 152;
  • 28) 0.000 000 000 001 152 124 977 152 × 2 = 0 + 0.000 000 000 002 304 249 954 304;
  • 29) 0.000 000 000 002 304 249 954 304 × 2 = 0 + 0.000 000 000 004 608 499 908 608;
  • 30) 0.000 000 000 004 608 499 908 608 × 2 = 0 + 0.000 000 000 009 216 999 817 216;
  • 31) 0.000 000 000 009 216 999 817 216 × 2 = 0 + 0.000 000 000 018 433 999 634 432;
  • 32) 0.000 000 000 018 433 999 634 432 × 2 = 0 + 0.000 000 000 036 867 999 268 864;
  • 33) 0.000 000 000 036 867 999 268 864 × 2 = 0 + 0.000 000 000 073 735 998 537 728;
  • 34) 0.000 000 000 073 735 998 537 728 × 2 = 0 + 0.000 000 000 147 471 997 075 456;
  • 35) 0.000 000 000 147 471 997 075 456 × 2 = 0 + 0.000 000 000 294 943 994 150 912;
  • 36) 0.000 000 000 294 943 994 150 912 × 2 = 0 + 0.000 000 000 589 887 988 301 824;
  • 37) 0.000 000 000 589 887 988 301 824 × 2 = 0 + 0.000 000 001 179 775 976 603 648;
  • 38) 0.000 000 001 179 775 976 603 648 × 2 = 0 + 0.000 000 002 359 551 953 207 296;
  • 39) 0.000 000 002 359 551 953 207 296 × 2 = 0 + 0.000 000 004 719 103 906 414 592;
  • 40) 0.000 000 004 719 103 906 414 592 × 2 = 0 + 0.000 000 009 438 207 812 829 184;
  • 41) 0.000 000 009 438 207 812 829 184 × 2 = 0 + 0.000 000 018 876 415 625 658 368;
  • 42) 0.000 000 018 876 415 625 658 368 × 2 = 0 + 0.000 000 037 752 831 251 316 736;
  • 43) 0.000 000 037 752 831 251 316 736 × 2 = 0 + 0.000 000 075 505 662 502 633 472;
  • 44) 0.000 000 075 505 662 502 633 472 × 2 = 0 + 0.000 000 151 011 325 005 266 944;
  • 45) 0.000 000 151 011 325 005 266 944 × 2 = 0 + 0.000 000 302 022 650 010 533 888;
  • 46) 0.000 000 302 022 650 010 533 888 × 2 = 0 + 0.000 000 604 045 300 021 067 776;
  • 47) 0.000 000 604 045 300 021 067 776 × 2 = 0 + 0.000 001 208 090 600 042 135 552;
  • 48) 0.000 001 208 090 600 042 135 552 × 2 = 0 + 0.000 002 416 181 200 084 271 104;
  • 49) 0.000 002 416 181 200 084 271 104 × 2 = 0 + 0.000 004 832 362 400 168 542 208;
  • 50) 0.000 004 832 362 400 168 542 208 × 2 = 0 + 0.000 009 664 724 800 337 084 416;
  • 51) 0.000 009 664 724 800 337 084 416 × 2 = 0 + 0.000 019 329 449 600 674 168 832;
  • 52) 0.000 019 329 449 600 674 168 832 × 2 = 0 + 0.000 038 658 899 201 348 337 664;
  • 53) 0.000 038 658 899 201 348 337 664 × 2 = 0 + 0.000 077 317 798 402 696 675 328;
  • 54) 0.000 077 317 798 402 696 675 328 × 2 = 0 + 0.000 154 635 596 805 393 350 656;
  • 55) 0.000 154 635 596 805 393 350 656 × 2 = 0 + 0.000 309 271 193 610 786 701 312;
  • 56) 0.000 309 271 193 610 786 701 312 × 2 = 0 + 0.000 618 542 387 221 573 402 624;
  • 57) 0.000 618 542 387 221 573 402 624 × 2 = 0 + 0.001 237 084 774 443 146 805 248;
  • 58) 0.001 237 084 774 443 146 805 248 × 2 = 0 + 0.002 474 169 548 886 293 610 496;
  • 59) 0.002 474 169 548 886 293 610 496 × 2 = 0 + 0.004 948 339 097 772 587 220 992;
  • 60) 0.004 948 339 097 772 587 220 992 × 2 = 0 + 0.009 896 678 195 545 174 441 984;
  • 61) 0.009 896 678 195 545 174 441 984 × 2 = 0 + 0.019 793 356 391 090 348 883 968;
  • 62) 0.019 793 356 391 090 348 883 968 × 2 = 0 + 0.039 586 712 782 180 697 767 936;
  • 63) 0.039 586 712 782 180 697 767 936 × 2 = 0 + 0.079 173 425 564 361 395 535 872;
  • 64) 0.079 173 425 564 361 395 535 872 × 2 = 0 + 0.158 346 851 128 722 791 071 744;
  • 65) 0.158 346 851 128 722 791 071 744 × 2 = 0 + 0.316 693 702 257 445 582 143 488;
  • 66) 0.316 693 702 257 445 582 143 488 × 2 = 0 + 0.633 387 404 514 891 164 286 976;
  • 67) 0.633 387 404 514 891 164 286 976 × 2 = 1 + 0.266 774 809 029 782 328 573 952;
  • 68) 0.266 774 809 029 782 328 573 952 × 2 = 0 + 0.533 549 618 059 564 657 147 904;
  • 69) 0.533 549 618 059 564 657 147 904 × 2 = 1 + 0.067 099 236 119 129 314 295 808;
  • 70) 0.067 099 236 119 129 314 295 808 × 2 = 0 + 0.134 198 472 238 258 628 591 616;
  • 71) 0.134 198 472 238 258 628 591 616 × 2 = 0 + 0.268 396 944 476 517 257 183 232;
  • 72) 0.268 396 944 476 517 257 183 232 × 2 = 0 + 0.536 793 888 953 034 514 366 464;
  • 73) 0.536 793 888 953 034 514 366 464 × 2 = 1 + 0.073 587 777 906 069 028 732 928;
  • 74) 0.073 587 777 906 069 028 732 928 × 2 = 0 + 0.147 175 555 812 138 057 465 856;
  • 75) 0.147 175 555 812 138 057 465 856 × 2 = 0 + 0.294 351 111 624 276 114 931 712;
  • 76) 0.294 351 111 624 276 114 931 712 × 2 = 0 + 0.588 702 223 248 552 229 863 424;
  • 77) 0.588 702 223 248 552 229 863 424 × 2 = 1 + 0.177 404 446 497 104 459 726 848;
  • 78) 0.177 404 446 497 104 459 726 848 × 2 = 0 + 0.354 808 892 994 208 919 453 696;
  • 79) 0.354 808 892 994 208 919 453 696 × 2 = 0 + 0.709 617 785 988 417 838 907 392;
  • 80) 0.709 617 785 988 417 838 907 392 × 2 = 1 + 0.419 235 571 976 835 677 814 784;
  • 81) 0.419 235 571 976 835 677 814 784 × 2 = 0 + 0.838 471 143 953 671 355 629 568;
  • 82) 0.838 471 143 953 671 355 629 568 × 2 = 1 + 0.676 942 287 907 342 711 259 136;
  • 83) 0.676 942 287 907 342 711 259 136 × 2 = 1 + 0.353 884 575 814 685 422 518 272;
  • 84) 0.353 884 575 814 685 422 518 272 × 2 = 0 + 0.707 769 151 629 370 845 036 544;
  • 85) 0.707 769 151 629 370 845 036 544 × 2 = 1 + 0.415 538 303 258 741 690 073 088;
  • 86) 0.415 538 303 258 741 690 073 088 × 2 = 0 + 0.831 076 606 517 483 380 146 176;
  • 87) 0.831 076 606 517 483 380 146 176 × 2 = 1 + 0.662 153 213 034 966 760 292 352;
  • 88) 0.662 153 213 034 966 760 292 352 × 2 = 1 + 0.324 306 426 069 933 520 584 704;
  • 89) 0.324 306 426 069 933 520 584 704 × 2 = 0 + 0.648 612 852 139 867 041 169 408;
  • 90) 0.648 612 852 139 867 041 169 408 × 2 = 1 + 0.297 225 704 279 734 082 338 816;
  • 91) 0.297 225 704 279 734 082 338 816 × 2 = 0 + 0.594 451 408 559 468 164 677 632;
  • 92) 0.594 451 408 559 468 164 677 632 × 2 = 1 + 0.188 902 817 118 936 329 355 264;
  • 93) 0.188 902 817 118 936 329 355 264 × 2 = 0 + 0.377 805 634 237 872 658 710 528;
  • 94) 0.377 805 634 237 872 658 710 528 × 2 = 0 + 0.755 611 268 475 745 317 421 056;
  • 95) 0.755 611 268 475 745 317 421 056 × 2 = 1 + 0.511 222 536 951 490 634 842 112;
  • 96) 0.511 222 536 951 490 634 842 112 × 2 = 1 + 0.022 445 073 902 981 269 684 224;
  • 97) 0.022 445 073 902 981 269 684 224 × 2 = 0 + 0.044 890 147 805 962 539 368 448;
  • 98) 0.044 890 147 805 962 539 368 448 × 2 = 0 + 0.089 780 295 611 925 078 736 896;
  • 99) 0.089 780 295 611 925 078 736 896 × 2 = 0 + 0.179 560 591 223 850 157 473 792;
  • 100) 0.179 560 591 223 850 157 473 792 × 2 = 0 + 0.359 121 182 447 700 314 947 584;
  • 101) 0.359 121 182 447 700 314 947 584 × 2 = 0 + 0.718 242 364 895 400 629 895 168;
  • 102) 0.718 242 364 895 400 629 895 168 × 2 = 1 + 0.436 484 729 790 801 259 790 336;
  • 103) 0.436 484 729 790 801 259 790 336 × 2 = 0 + 0.872 969 459 581 602 519 580 672;
  • 104) 0.872 969 459 581 602 519 580 672 × 2 = 1 + 0.745 938 919 163 205 039 161 344;
  • 105) 0.745 938 919 163 205 039 161 344 × 2 = 1 + 0.491 877 838 326 410 078 322 688;
  • 106) 0.491 877 838 326 410 078 322 688 × 2 = 0 + 0.983 755 676 652 820 156 645 376;
  • 107) 0.983 755 676 652 820 156 645 376 × 2 = 1 + 0.967 511 353 305 640 313 290 752;
  • 108) 0.967 511 353 305 640 313 290 752 × 2 = 1 + 0.935 022 706 611 280 626 581 504;
  • 109) 0.935 022 706 611 280 626 581 504 × 2 = 1 + 0.870 045 413 222 561 253 163 008;
  • 110) 0.870 045 413 222 561 253 163 008 × 2 = 1 + 0.740 090 826 445 122 506 326 016;
  • 111) 0.740 090 826 445 122 506 326 016 × 2 = 1 + 0.480 181 652 890 245 012 652 032;
  • 112) 0.480 181 652 890 245 012 652 032 × 2 = 0 + 0.960 363 305 780 490 025 304 064;
  • 113) 0.960 363 305 780 490 025 304 064 × 2 = 1 + 0.920 726 611 560 980 050 608 128;
  • 114) 0.920 726 611 560 980 050 608 128 × 2 = 1 + 0.841 453 223 121 960 101 216 256;
  • 115) 0.841 453 223 121 960 101 216 256 × 2 = 1 + 0.682 906 446 243 920 202 432 512;
  • 116) 0.682 906 446 243 920 202 432 512 × 2 = 1 + 0.365 812 892 487 840 404 865 024;
  • 117) 0.365 812 892 487 840 404 865 024 × 2 = 0 + 0.731 625 784 975 680 809 730 048;
  • 118) 0.731 625 784 975 680 809 730 048 × 2 = 1 + 0.463 251 569 951 361 619 460 096;
  • 119) 0.463 251 569 951 361 619 460 096 × 2 = 0 + 0.926 503 139 902 723 238 920 192;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 584(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 1000 1001 0110 1011 0101 0011 0000 0101 1011 1110 1111 010(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 584(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 1000 1001 0110 1011 0101 0011 0000 0101 1011 1110 1111 010(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 584(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 1000 1001 0110 1011 0101 0011 0000 0101 1011 1110 1111 010(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 1000 1001 0110 1011 0101 0011 0000 0101 1011 1110 1111 010(2) × 20 =


1.0100 0100 0100 1011 0101 1010 1001 1000 0010 1101 1111 0111 1010(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0100 0100 1011 0101 1010 1001 1000 0010 1101 1111 0111 1010


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0100 0100 1011 0101 1010 1001 1000 0010 1101 1111 0111 1010 =


0100 0100 0100 1011 0101 1010 1001 1000 0010 1101 1111 0111 1010


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0100 0100 1011 0101 1010 1001 1000 0010 1101 1111 0111 1010


Decimal number 0.000 000 000 000 000 000 008 584 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0100 0100 1011 0101 1010 1001 1000 0010 1101 1111 0111 1010


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100