0.000 000 000 000 000 000 008 511 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 511(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 511(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 511.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 511 × 2 = 0 + 0.000 000 000 000 000 000 017 022;
  • 2) 0.000 000 000 000 000 000 017 022 × 2 = 0 + 0.000 000 000 000 000 000 034 044;
  • 3) 0.000 000 000 000 000 000 034 044 × 2 = 0 + 0.000 000 000 000 000 000 068 088;
  • 4) 0.000 000 000 000 000 000 068 088 × 2 = 0 + 0.000 000 000 000 000 000 136 176;
  • 5) 0.000 000 000 000 000 000 136 176 × 2 = 0 + 0.000 000 000 000 000 000 272 352;
  • 6) 0.000 000 000 000 000 000 272 352 × 2 = 0 + 0.000 000 000 000 000 000 544 704;
  • 7) 0.000 000 000 000 000 000 544 704 × 2 = 0 + 0.000 000 000 000 000 001 089 408;
  • 8) 0.000 000 000 000 000 001 089 408 × 2 = 0 + 0.000 000 000 000 000 002 178 816;
  • 9) 0.000 000 000 000 000 002 178 816 × 2 = 0 + 0.000 000 000 000 000 004 357 632;
  • 10) 0.000 000 000 000 000 004 357 632 × 2 = 0 + 0.000 000 000 000 000 008 715 264;
  • 11) 0.000 000 000 000 000 008 715 264 × 2 = 0 + 0.000 000 000 000 000 017 430 528;
  • 12) 0.000 000 000 000 000 017 430 528 × 2 = 0 + 0.000 000 000 000 000 034 861 056;
  • 13) 0.000 000 000 000 000 034 861 056 × 2 = 0 + 0.000 000 000 000 000 069 722 112;
  • 14) 0.000 000 000 000 000 069 722 112 × 2 = 0 + 0.000 000 000 000 000 139 444 224;
  • 15) 0.000 000 000 000 000 139 444 224 × 2 = 0 + 0.000 000 000 000 000 278 888 448;
  • 16) 0.000 000 000 000 000 278 888 448 × 2 = 0 + 0.000 000 000 000 000 557 776 896;
  • 17) 0.000 000 000 000 000 557 776 896 × 2 = 0 + 0.000 000 000 000 001 115 553 792;
  • 18) 0.000 000 000 000 001 115 553 792 × 2 = 0 + 0.000 000 000 000 002 231 107 584;
  • 19) 0.000 000 000 000 002 231 107 584 × 2 = 0 + 0.000 000 000 000 004 462 215 168;
  • 20) 0.000 000 000 000 004 462 215 168 × 2 = 0 + 0.000 000 000 000 008 924 430 336;
  • 21) 0.000 000 000 000 008 924 430 336 × 2 = 0 + 0.000 000 000 000 017 848 860 672;
  • 22) 0.000 000 000 000 017 848 860 672 × 2 = 0 + 0.000 000 000 000 035 697 721 344;
  • 23) 0.000 000 000 000 035 697 721 344 × 2 = 0 + 0.000 000 000 000 071 395 442 688;
  • 24) 0.000 000 000 000 071 395 442 688 × 2 = 0 + 0.000 000 000 000 142 790 885 376;
  • 25) 0.000 000 000 000 142 790 885 376 × 2 = 0 + 0.000 000 000 000 285 581 770 752;
  • 26) 0.000 000 000 000 285 581 770 752 × 2 = 0 + 0.000 000 000 000 571 163 541 504;
  • 27) 0.000 000 000 000 571 163 541 504 × 2 = 0 + 0.000 000 000 001 142 327 083 008;
  • 28) 0.000 000 000 001 142 327 083 008 × 2 = 0 + 0.000 000 000 002 284 654 166 016;
  • 29) 0.000 000 000 002 284 654 166 016 × 2 = 0 + 0.000 000 000 004 569 308 332 032;
  • 30) 0.000 000 000 004 569 308 332 032 × 2 = 0 + 0.000 000 000 009 138 616 664 064;
  • 31) 0.000 000 000 009 138 616 664 064 × 2 = 0 + 0.000 000 000 018 277 233 328 128;
  • 32) 0.000 000 000 018 277 233 328 128 × 2 = 0 + 0.000 000 000 036 554 466 656 256;
  • 33) 0.000 000 000 036 554 466 656 256 × 2 = 0 + 0.000 000 000 073 108 933 312 512;
  • 34) 0.000 000 000 073 108 933 312 512 × 2 = 0 + 0.000 000 000 146 217 866 625 024;
  • 35) 0.000 000 000 146 217 866 625 024 × 2 = 0 + 0.000 000 000 292 435 733 250 048;
  • 36) 0.000 000 000 292 435 733 250 048 × 2 = 0 + 0.000 000 000 584 871 466 500 096;
  • 37) 0.000 000 000 584 871 466 500 096 × 2 = 0 + 0.000 000 001 169 742 933 000 192;
  • 38) 0.000 000 001 169 742 933 000 192 × 2 = 0 + 0.000 000 002 339 485 866 000 384;
  • 39) 0.000 000 002 339 485 866 000 384 × 2 = 0 + 0.000 000 004 678 971 732 000 768;
  • 40) 0.000 000 004 678 971 732 000 768 × 2 = 0 + 0.000 000 009 357 943 464 001 536;
  • 41) 0.000 000 009 357 943 464 001 536 × 2 = 0 + 0.000 000 018 715 886 928 003 072;
  • 42) 0.000 000 018 715 886 928 003 072 × 2 = 0 + 0.000 000 037 431 773 856 006 144;
  • 43) 0.000 000 037 431 773 856 006 144 × 2 = 0 + 0.000 000 074 863 547 712 012 288;
  • 44) 0.000 000 074 863 547 712 012 288 × 2 = 0 + 0.000 000 149 727 095 424 024 576;
  • 45) 0.000 000 149 727 095 424 024 576 × 2 = 0 + 0.000 000 299 454 190 848 049 152;
  • 46) 0.000 000 299 454 190 848 049 152 × 2 = 0 + 0.000 000 598 908 381 696 098 304;
  • 47) 0.000 000 598 908 381 696 098 304 × 2 = 0 + 0.000 001 197 816 763 392 196 608;
  • 48) 0.000 001 197 816 763 392 196 608 × 2 = 0 + 0.000 002 395 633 526 784 393 216;
  • 49) 0.000 002 395 633 526 784 393 216 × 2 = 0 + 0.000 004 791 267 053 568 786 432;
  • 50) 0.000 004 791 267 053 568 786 432 × 2 = 0 + 0.000 009 582 534 107 137 572 864;
  • 51) 0.000 009 582 534 107 137 572 864 × 2 = 0 + 0.000 019 165 068 214 275 145 728;
  • 52) 0.000 019 165 068 214 275 145 728 × 2 = 0 + 0.000 038 330 136 428 550 291 456;
  • 53) 0.000 038 330 136 428 550 291 456 × 2 = 0 + 0.000 076 660 272 857 100 582 912;
  • 54) 0.000 076 660 272 857 100 582 912 × 2 = 0 + 0.000 153 320 545 714 201 165 824;
  • 55) 0.000 153 320 545 714 201 165 824 × 2 = 0 + 0.000 306 641 091 428 402 331 648;
  • 56) 0.000 306 641 091 428 402 331 648 × 2 = 0 + 0.000 613 282 182 856 804 663 296;
  • 57) 0.000 613 282 182 856 804 663 296 × 2 = 0 + 0.001 226 564 365 713 609 326 592;
  • 58) 0.001 226 564 365 713 609 326 592 × 2 = 0 + 0.002 453 128 731 427 218 653 184;
  • 59) 0.002 453 128 731 427 218 653 184 × 2 = 0 + 0.004 906 257 462 854 437 306 368;
  • 60) 0.004 906 257 462 854 437 306 368 × 2 = 0 + 0.009 812 514 925 708 874 612 736;
  • 61) 0.009 812 514 925 708 874 612 736 × 2 = 0 + 0.019 625 029 851 417 749 225 472;
  • 62) 0.019 625 029 851 417 749 225 472 × 2 = 0 + 0.039 250 059 702 835 498 450 944;
  • 63) 0.039 250 059 702 835 498 450 944 × 2 = 0 + 0.078 500 119 405 670 996 901 888;
  • 64) 0.078 500 119 405 670 996 901 888 × 2 = 0 + 0.157 000 238 811 341 993 803 776;
  • 65) 0.157 000 238 811 341 993 803 776 × 2 = 0 + 0.314 000 477 622 683 987 607 552;
  • 66) 0.314 000 477 622 683 987 607 552 × 2 = 0 + 0.628 000 955 245 367 975 215 104;
  • 67) 0.628 000 955 245 367 975 215 104 × 2 = 1 + 0.256 001 910 490 735 950 430 208;
  • 68) 0.256 001 910 490 735 950 430 208 × 2 = 0 + 0.512 003 820 981 471 900 860 416;
  • 69) 0.512 003 820 981 471 900 860 416 × 2 = 1 + 0.024 007 641 962 943 801 720 832;
  • 70) 0.024 007 641 962 943 801 720 832 × 2 = 0 + 0.048 015 283 925 887 603 441 664;
  • 71) 0.048 015 283 925 887 603 441 664 × 2 = 0 + 0.096 030 567 851 775 206 883 328;
  • 72) 0.096 030 567 851 775 206 883 328 × 2 = 0 + 0.192 061 135 703 550 413 766 656;
  • 73) 0.192 061 135 703 550 413 766 656 × 2 = 0 + 0.384 122 271 407 100 827 533 312;
  • 74) 0.384 122 271 407 100 827 533 312 × 2 = 0 + 0.768 244 542 814 201 655 066 624;
  • 75) 0.768 244 542 814 201 655 066 624 × 2 = 1 + 0.536 489 085 628 403 310 133 248;
  • 76) 0.536 489 085 628 403 310 133 248 × 2 = 1 + 0.072 978 171 256 806 620 266 496;
  • 77) 0.072 978 171 256 806 620 266 496 × 2 = 0 + 0.145 956 342 513 613 240 532 992;
  • 78) 0.145 956 342 513 613 240 532 992 × 2 = 0 + 0.291 912 685 027 226 481 065 984;
  • 79) 0.291 912 685 027 226 481 065 984 × 2 = 0 + 0.583 825 370 054 452 962 131 968;
  • 80) 0.583 825 370 054 452 962 131 968 × 2 = 1 + 0.167 650 740 108 905 924 263 936;
  • 81) 0.167 650 740 108 905 924 263 936 × 2 = 0 + 0.335 301 480 217 811 848 527 872;
  • 82) 0.335 301 480 217 811 848 527 872 × 2 = 0 + 0.670 602 960 435 623 697 055 744;
  • 83) 0.670 602 960 435 623 697 055 744 × 2 = 1 + 0.341 205 920 871 247 394 111 488;
  • 84) 0.341 205 920 871 247 394 111 488 × 2 = 0 + 0.682 411 841 742 494 788 222 976;
  • 85) 0.682 411 841 742 494 788 222 976 × 2 = 1 + 0.364 823 683 484 989 576 445 952;
  • 86) 0.364 823 683 484 989 576 445 952 × 2 = 0 + 0.729 647 366 969 979 152 891 904;
  • 87) 0.729 647 366 969 979 152 891 904 × 2 = 1 + 0.459 294 733 939 958 305 783 808;
  • 88) 0.459 294 733 939 958 305 783 808 × 2 = 0 + 0.918 589 467 879 916 611 567 616;
  • 89) 0.918 589 467 879 916 611 567 616 × 2 = 1 + 0.837 178 935 759 833 223 135 232;
  • 90) 0.837 178 935 759 833 223 135 232 × 2 = 1 + 0.674 357 871 519 666 446 270 464;
  • 91) 0.674 357 871 519 666 446 270 464 × 2 = 1 + 0.348 715 743 039 332 892 540 928;
  • 92) 0.348 715 743 039 332 892 540 928 × 2 = 0 + 0.697 431 486 078 665 785 081 856;
  • 93) 0.697 431 486 078 665 785 081 856 × 2 = 1 + 0.394 862 972 157 331 570 163 712;
  • 94) 0.394 862 972 157 331 570 163 712 × 2 = 0 + 0.789 725 944 314 663 140 327 424;
  • 95) 0.789 725 944 314 663 140 327 424 × 2 = 1 + 0.579 451 888 629 326 280 654 848;
  • 96) 0.579 451 888 629 326 280 654 848 × 2 = 1 + 0.158 903 777 258 652 561 309 696;
  • 97) 0.158 903 777 258 652 561 309 696 × 2 = 0 + 0.317 807 554 517 305 122 619 392;
  • 98) 0.317 807 554 517 305 122 619 392 × 2 = 0 + 0.635 615 109 034 610 245 238 784;
  • 99) 0.635 615 109 034 610 245 238 784 × 2 = 1 + 0.271 230 218 069 220 490 477 568;
  • 100) 0.271 230 218 069 220 490 477 568 × 2 = 0 + 0.542 460 436 138 440 980 955 136;
  • 101) 0.542 460 436 138 440 980 955 136 × 2 = 1 + 0.084 920 872 276 881 961 910 272;
  • 102) 0.084 920 872 276 881 961 910 272 × 2 = 0 + 0.169 841 744 553 763 923 820 544;
  • 103) 0.169 841 744 553 763 923 820 544 × 2 = 0 + 0.339 683 489 107 527 847 641 088;
  • 104) 0.339 683 489 107 527 847 641 088 × 2 = 0 + 0.679 366 978 215 055 695 282 176;
  • 105) 0.679 366 978 215 055 695 282 176 × 2 = 1 + 0.358 733 956 430 111 390 564 352;
  • 106) 0.358 733 956 430 111 390 564 352 × 2 = 0 + 0.717 467 912 860 222 781 128 704;
  • 107) 0.717 467 912 860 222 781 128 704 × 2 = 1 + 0.434 935 825 720 445 562 257 408;
  • 108) 0.434 935 825 720 445 562 257 408 × 2 = 0 + 0.869 871 651 440 891 124 514 816;
  • 109) 0.869 871 651 440 891 124 514 816 × 2 = 1 + 0.739 743 302 881 782 249 029 632;
  • 110) 0.739 743 302 881 782 249 029 632 × 2 = 1 + 0.479 486 605 763 564 498 059 264;
  • 111) 0.479 486 605 763 564 498 059 264 × 2 = 0 + 0.958 973 211 527 128 996 118 528;
  • 112) 0.958 973 211 527 128 996 118 528 × 2 = 1 + 0.917 946 423 054 257 992 237 056;
  • 113) 0.917 946 423 054 257 992 237 056 × 2 = 1 + 0.835 892 846 108 515 984 474 112;
  • 114) 0.835 892 846 108 515 984 474 112 × 2 = 1 + 0.671 785 692 217 031 968 948 224;
  • 115) 0.671 785 692 217 031 968 948 224 × 2 = 1 + 0.343 571 384 434 063 937 896 448;
  • 116) 0.343 571 384 434 063 937 896 448 × 2 = 0 + 0.687 142 768 868 127 875 792 896;
  • 117) 0.687 142 768 868 127 875 792 896 × 2 = 1 + 0.374 285 537 736 255 751 585 792;
  • 118) 0.374 285 537 736 255 751 585 792 × 2 = 0 + 0.748 571 075 472 511 503 171 584;
  • 119) 0.748 571 075 472 511 503 171 584 × 2 = 1 + 0.497 142 150 945 023 006 343 168;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 511(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0011 0001 0010 1010 1110 1011 0010 1000 1010 1101 1110 101(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 511(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0011 0001 0010 1010 1110 1011 0010 1000 1010 1101 1110 101(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 511(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0011 0001 0010 1010 1110 1011 0010 1000 1010 1101 1110 101(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0011 0001 0010 1010 1110 1011 0010 1000 1010 1101 1110 101(2) × 20 =


1.0100 0001 1000 1001 0101 0111 0101 1001 0100 0101 0110 1111 0101(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0001 1000 1001 0101 0111 0101 1001 0100 0101 0110 1111 0101


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0001 1000 1001 0101 0111 0101 1001 0100 0101 0110 1111 0101 =


0100 0001 1000 1001 0101 0111 0101 1001 0100 0101 0110 1111 0101


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0001 1000 1001 0101 0111 0101 1001 0100 0101 0110 1111 0101


Decimal number 0.000 000 000 000 000 000 008 511 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0001 1000 1001 0101 0111 0101 1001 0100 0101 0110 1111 0101


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100