0.000 000 000 000 000 000 008 526 5 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 526 5(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 526 5(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 526 5.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 526 5 × 2 = 0 + 0.000 000 000 000 000 000 017 053;
  • 2) 0.000 000 000 000 000 000 017 053 × 2 = 0 + 0.000 000 000 000 000 000 034 106;
  • 3) 0.000 000 000 000 000 000 034 106 × 2 = 0 + 0.000 000 000 000 000 000 068 212;
  • 4) 0.000 000 000 000 000 000 068 212 × 2 = 0 + 0.000 000 000 000 000 000 136 424;
  • 5) 0.000 000 000 000 000 000 136 424 × 2 = 0 + 0.000 000 000 000 000 000 272 848;
  • 6) 0.000 000 000 000 000 000 272 848 × 2 = 0 + 0.000 000 000 000 000 000 545 696;
  • 7) 0.000 000 000 000 000 000 545 696 × 2 = 0 + 0.000 000 000 000 000 001 091 392;
  • 8) 0.000 000 000 000 000 001 091 392 × 2 = 0 + 0.000 000 000 000 000 002 182 784;
  • 9) 0.000 000 000 000 000 002 182 784 × 2 = 0 + 0.000 000 000 000 000 004 365 568;
  • 10) 0.000 000 000 000 000 004 365 568 × 2 = 0 + 0.000 000 000 000 000 008 731 136;
  • 11) 0.000 000 000 000 000 008 731 136 × 2 = 0 + 0.000 000 000 000 000 017 462 272;
  • 12) 0.000 000 000 000 000 017 462 272 × 2 = 0 + 0.000 000 000 000 000 034 924 544;
  • 13) 0.000 000 000 000 000 034 924 544 × 2 = 0 + 0.000 000 000 000 000 069 849 088;
  • 14) 0.000 000 000 000 000 069 849 088 × 2 = 0 + 0.000 000 000 000 000 139 698 176;
  • 15) 0.000 000 000 000 000 139 698 176 × 2 = 0 + 0.000 000 000 000 000 279 396 352;
  • 16) 0.000 000 000 000 000 279 396 352 × 2 = 0 + 0.000 000 000 000 000 558 792 704;
  • 17) 0.000 000 000 000 000 558 792 704 × 2 = 0 + 0.000 000 000 000 001 117 585 408;
  • 18) 0.000 000 000 000 001 117 585 408 × 2 = 0 + 0.000 000 000 000 002 235 170 816;
  • 19) 0.000 000 000 000 002 235 170 816 × 2 = 0 + 0.000 000 000 000 004 470 341 632;
  • 20) 0.000 000 000 000 004 470 341 632 × 2 = 0 + 0.000 000 000 000 008 940 683 264;
  • 21) 0.000 000 000 000 008 940 683 264 × 2 = 0 + 0.000 000 000 000 017 881 366 528;
  • 22) 0.000 000 000 000 017 881 366 528 × 2 = 0 + 0.000 000 000 000 035 762 733 056;
  • 23) 0.000 000 000 000 035 762 733 056 × 2 = 0 + 0.000 000 000 000 071 525 466 112;
  • 24) 0.000 000 000 000 071 525 466 112 × 2 = 0 + 0.000 000 000 000 143 050 932 224;
  • 25) 0.000 000 000 000 143 050 932 224 × 2 = 0 + 0.000 000 000 000 286 101 864 448;
  • 26) 0.000 000 000 000 286 101 864 448 × 2 = 0 + 0.000 000 000 000 572 203 728 896;
  • 27) 0.000 000 000 000 572 203 728 896 × 2 = 0 + 0.000 000 000 001 144 407 457 792;
  • 28) 0.000 000 000 001 144 407 457 792 × 2 = 0 + 0.000 000 000 002 288 814 915 584;
  • 29) 0.000 000 000 002 288 814 915 584 × 2 = 0 + 0.000 000 000 004 577 629 831 168;
  • 30) 0.000 000 000 004 577 629 831 168 × 2 = 0 + 0.000 000 000 009 155 259 662 336;
  • 31) 0.000 000 000 009 155 259 662 336 × 2 = 0 + 0.000 000 000 018 310 519 324 672;
  • 32) 0.000 000 000 018 310 519 324 672 × 2 = 0 + 0.000 000 000 036 621 038 649 344;
  • 33) 0.000 000 000 036 621 038 649 344 × 2 = 0 + 0.000 000 000 073 242 077 298 688;
  • 34) 0.000 000 000 073 242 077 298 688 × 2 = 0 + 0.000 000 000 146 484 154 597 376;
  • 35) 0.000 000 000 146 484 154 597 376 × 2 = 0 + 0.000 000 000 292 968 309 194 752;
  • 36) 0.000 000 000 292 968 309 194 752 × 2 = 0 + 0.000 000 000 585 936 618 389 504;
  • 37) 0.000 000 000 585 936 618 389 504 × 2 = 0 + 0.000 000 001 171 873 236 779 008;
  • 38) 0.000 000 001 171 873 236 779 008 × 2 = 0 + 0.000 000 002 343 746 473 558 016;
  • 39) 0.000 000 002 343 746 473 558 016 × 2 = 0 + 0.000 000 004 687 492 947 116 032;
  • 40) 0.000 000 004 687 492 947 116 032 × 2 = 0 + 0.000 000 009 374 985 894 232 064;
  • 41) 0.000 000 009 374 985 894 232 064 × 2 = 0 + 0.000 000 018 749 971 788 464 128;
  • 42) 0.000 000 018 749 971 788 464 128 × 2 = 0 + 0.000 000 037 499 943 576 928 256;
  • 43) 0.000 000 037 499 943 576 928 256 × 2 = 0 + 0.000 000 074 999 887 153 856 512;
  • 44) 0.000 000 074 999 887 153 856 512 × 2 = 0 + 0.000 000 149 999 774 307 713 024;
  • 45) 0.000 000 149 999 774 307 713 024 × 2 = 0 + 0.000 000 299 999 548 615 426 048;
  • 46) 0.000 000 299 999 548 615 426 048 × 2 = 0 + 0.000 000 599 999 097 230 852 096;
  • 47) 0.000 000 599 999 097 230 852 096 × 2 = 0 + 0.000 001 199 998 194 461 704 192;
  • 48) 0.000 001 199 998 194 461 704 192 × 2 = 0 + 0.000 002 399 996 388 923 408 384;
  • 49) 0.000 002 399 996 388 923 408 384 × 2 = 0 + 0.000 004 799 992 777 846 816 768;
  • 50) 0.000 004 799 992 777 846 816 768 × 2 = 0 + 0.000 009 599 985 555 693 633 536;
  • 51) 0.000 009 599 985 555 693 633 536 × 2 = 0 + 0.000 019 199 971 111 387 267 072;
  • 52) 0.000 019 199 971 111 387 267 072 × 2 = 0 + 0.000 038 399 942 222 774 534 144;
  • 53) 0.000 038 399 942 222 774 534 144 × 2 = 0 + 0.000 076 799 884 445 549 068 288;
  • 54) 0.000 076 799 884 445 549 068 288 × 2 = 0 + 0.000 153 599 768 891 098 136 576;
  • 55) 0.000 153 599 768 891 098 136 576 × 2 = 0 + 0.000 307 199 537 782 196 273 152;
  • 56) 0.000 307 199 537 782 196 273 152 × 2 = 0 + 0.000 614 399 075 564 392 546 304;
  • 57) 0.000 614 399 075 564 392 546 304 × 2 = 0 + 0.001 228 798 151 128 785 092 608;
  • 58) 0.001 228 798 151 128 785 092 608 × 2 = 0 + 0.002 457 596 302 257 570 185 216;
  • 59) 0.002 457 596 302 257 570 185 216 × 2 = 0 + 0.004 915 192 604 515 140 370 432;
  • 60) 0.004 915 192 604 515 140 370 432 × 2 = 0 + 0.009 830 385 209 030 280 740 864;
  • 61) 0.009 830 385 209 030 280 740 864 × 2 = 0 + 0.019 660 770 418 060 561 481 728;
  • 62) 0.019 660 770 418 060 561 481 728 × 2 = 0 + 0.039 321 540 836 121 122 963 456;
  • 63) 0.039 321 540 836 121 122 963 456 × 2 = 0 + 0.078 643 081 672 242 245 926 912;
  • 64) 0.078 643 081 672 242 245 926 912 × 2 = 0 + 0.157 286 163 344 484 491 853 824;
  • 65) 0.157 286 163 344 484 491 853 824 × 2 = 0 + 0.314 572 326 688 968 983 707 648;
  • 66) 0.314 572 326 688 968 983 707 648 × 2 = 0 + 0.629 144 653 377 937 967 415 296;
  • 67) 0.629 144 653 377 937 967 415 296 × 2 = 1 + 0.258 289 306 755 875 934 830 592;
  • 68) 0.258 289 306 755 875 934 830 592 × 2 = 0 + 0.516 578 613 511 751 869 661 184;
  • 69) 0.516 578 613 511 751 869 661 184 × 2 = 1 + 0.033 157 227 023 503 739 322 368;
  • 70) 0.033 157 227 023 503 739 322 368 × 2 = 0 + 0.066 314 454 047 007 478 644 736;
  • 71) 0.066 314 454 047 007 478 644 736 × 2 = 0 + 0.132 628 908 094 014 957 289 472;
  • 72) 0.132 628 908 094 014 957 289 472 × 2 = 0 + 0.265 257 816 188 029 914 578 944;
  • 73) 0.265 257 816 188 029 914 578 944 × 2 = 0 + 0.530 515 632 376 059 829 157 888;
  • 74) 0.530 515 632 376 059 829 157 888 × 2 = 1 + 0.061 031 264 752 119 658 315 776;
  • 75) 0.061 031 264 752 119 658 315 776 × 2 = 0 + 0.122 062 529 504 239 316 631 552;
  • 76) 0.122 062 529 504 239 316 631 552 × 2 = 0 + 0.244 125 059 008 478 633 263 104;
  • 77) 0.244 125 059 008 478 633 263 104 × 2 = 0 + 0.488 250 118 016 957 266 526 208;
  • 78) 0.488 250 118 016 957 266 526 208 × 2 = 0 + 0.976 500 236 033 914 533 052 416;
  • 79) 0.976 500 236 033 914 533 052 416 × 2 = 1 + 0.953 000 472 067 829 066 104 832;
  • 80) 0.953 000 472 067 829 066 104 832 × 2 = 1 + 0.906 000 944 135 658 132 209 664;
  • 81) 0.906 000 944 135 658 132 209 664 × 2 = 1 + 0.812 001 888 271 316 264 419 328;
  • 82) 0.812 001 888 271 316 264 419 328 × 2 = 1 + 0.624 003 776 542 632 528 838 656;
  • 83) 0.624 003 776 542 632 528 838 656 × 2 = 1 + 0.248 007 553 085 265 057 677 312;
  • 84) 0.248 007 553 085 265 057 677 312 × 2 = 0 + 0.496 015 106 170 530 115 354 624;
  • 85) 0.496 015 106 170 530 115 354 624 × 2 = 0 + 0.992 030 212 341 060 230 709 248;
  • 86) 0.992 030 212 341 060 230 709 248 × 2 = 1 + 0.984 060 424 682 120 461 418 496;
  • 87) 0.984 060 424 682 120 461 418 496 × 2 = 1 + 0.968 120 849 364 240 922 836 992;
  • 88) 0.968 120 849 364 240 922 836 992 × 2 = 1 + 0.936 241 698 728 481 845 673 984;
  • 89) 0.936 241 698 728 481 845 673 984 × 2 = 1 + 0.872 483 397 456 963 691 347 968;
  • 90) 0.872 483 397 456 963 691 347 968 × 2 = 1 + 0.744 966 794 913 927 382 695 936;
  • 91) 0.744 966 794 913 927 382 695 936 × 2 = 1 + 0.489 933 589 827 854 765 391 872;
  • 92) 0.489 933 589 827 854 765 391 872 × 2 = 0 + 0.979 867 179 655 709 530 783 744;
  • 93) 0.979 867 179 655 709 530 783 744 × 2 = 1 + 0.959 734 359 311 419 061 567 488;
  • 94) 0.959 734 359 311 419 061 567 488 × 2 = 1 + 0.919 468 718 622 838 123 134 976;
  • 95) 0.919 468 718 622 838 123 134 976 × 2 = 1 + 0.838 937 437 245 676 246 269 952;
  • 96) 0.838 937 437 245 676 246 269 952 × 2 = 1 + 0.677 874 874 491 352 492 539 904;
  • 97) 0.677 874 874 491 352 492 539 904 × 2 = 1 + 0.355 749 748 982 704 985 079 808;
  • 98) 0.355 749 748 982 704 985 079 808 × 2 = 0 + 0.711 499 497 965 409 970 159 616;
  • 99) 0.711 499 497 965 409 970 159 616 × 2 = 1 + 0.422 998 995 930 819 940 319 232;
  • 100) 0.422 998 995 930 819 940 319 232 × 2 = 0 + 0.845 997 991 861 639 880 638 464;
  • 101) 0.845 997 991 861 639 880 638 464 × 2 = 1 + 0.691 995 983 723 279 761 276 928;
  • 102) 0.691 995 983 723 279 761 276 928 × 2 = 1 + 0.383 991 967 446 559 522 553 856;
  • 103) 0.383 991 967 446 559 522 553 856 × 2 = 0 + 0.767 983 934 893 119 045 107 712;
  • 104) 0.767 983 934 893 119 045 107 712 × 2 = 1 + 0.535 967 869 786 238 090 215 424;
  • 105) 0.535 967 869 786 238 090 215 424 × 2 = 1 + 0.071 935 739 572 476 180 430 848;
  • 106) 0.071 935 739 572 476 180 430 848 × 2 = 0 + 0.143 871 479 144 952 360 861 696;
  • 107) 0.143 871 479 144 952 360 861 696 × 2 = 0 + 0.287 742 958 289 904 721 723 392;
  • 108) 0.287 742 958 289 904 721 723 392 × 2 = 0 + 0.575 485 916 579 809 443 446 784;
  • 109) 0.575 485 916 579 809 443 446 784 × 2 = 1 + 0.150 971 833 159 618 886 893 568;
  • 110) 0.150 971 833 159 618 886 893 568 × 2 = 0 + 0.301 943 666 319 237 773 787 136;
  • 111) 0.301 943 666 319 237 773 787 136 × 2 = 0 + 0.603 887 332 638 475 547 574 272;
  • 112) 0.603 887 332 638 475 547 574 272 × 2 = 1 + 0.207 774 665 276 951 095 148 544;
  • 113) 0.207 774 665 276 951 095 148 544 × 2 = 0 + 0.415 549 330 553 902 190 297 088;
  • 114) 0.415 549 330 553 902 190 297 088 × 2 = 0 + 0.831 098 661 107 804 380 594 176;
  • 115) 0.831 098 661 107 804 380 594 176 × 2 = 1 + 0.662 197 322 215 608 761 188 352;
  • 116) 0.662 197 322 215 608 761 188 352 × 2 = 1 + 0.324 394 644 431 217 522 376 704;
  • 117) 0.324 394 644 431 217 522 376 704 × 2 = 0 + 0.648 789 288 862 435 044 753 408;
  • 118) 0.648 789 288 862 435 044 753 408 × 2 = 1 + 0.297 578 577 724 870 089 506 816;
  • 119) 0.297 578 577 724 870 089 506 816 × 2 = 0 + 0.595 157 155 449 740 179 013 632;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 526 5(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 0011 1110 0111 1110 1111 1010 1101 1000 1001 0011 010(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 526 5(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 0011 1110 0111 1110 1111 1010 1101 1000 1001 0011 010(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 526 5(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 0011 1110 0111 1110 1111 1010 1101 1000 1001 0011 010(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 0011 1110 0111 1110 1111 1010 1101 1000 1001 0011 010(2) × 20 =


1.0100 0010 0001 1111 0011 1111 0111 1101 0110 1100 0100 1001 1010(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 0001 1111 0011 1111 0111 1101 0110 1100 0100 1001 1010


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 0001 1111 0011 1111 0111 1101 0110 1100 0100 1001 1010 =


0100 0010 0001 1111 0011 1111 0111 1101 0110 1100 0100 1001 1010


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 0001 1111 0011 1111 0111 1101 0110 1100 0100 1001 1010


Decimal number 0.000 000 000 000 000 000 008 526 5 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 0001 1111 0011 1111 0111 1101 0110 1100 0100 1001 1010


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100