0.000 000 000 000 000 000 008 532 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 532(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 532(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 532.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 532 × 2 = 0 + 0.000 000 000 000 000 000 017 064;
  • 2) 0.000 000 000 000 000 000 017 064 × 2 = 0 + 0.000 000 000 000 000 000 034 128;
  • 3) 0.000 000 000 000 000 000 034 128 × 2 = 0 + 0.000 000 000 000 000 000 068 256;
  • 4) 0.000 000 000 000 000 000 068 256 × 2 = 0 + 0.000 000 000 000 000 000 136 512;
  • 5) 0.000 000 000 000 000 000 136 512 × 2 = 0 + 0.000 000 000 000 000 000 273 024;
  • 6) 0.000 000 000 000 000 000 273 024 × 2 = 0 + 0.000 000 000 000 000 000 546 048;
  • 7) 0.000 000 000 000 000 000 546 048 × 2 = 0 + 0.000 000 000 000 000 001 092 096;
  • 8) 0.000 000 000 000 000 001 092 096 × 2 = 0 + 0.000 000 000 000 000 002 184 192;
  • 9) 0.000 000 000 000 000 002 184 192 × 2 = 0 + 0.000 000 000 000 000 004 368 384;
  • 10) 0.000 000 000 000 000 004 368 384 × 2 = 0 + 0.000 000 000 000 000 008 736 768;
  • 11) 0.000 000 000 000 000 008 736 768 × 2 = 0 + 0.000 000 000 000 000 017 473 536;
  • 12) 0.000 000 000 000 000 017 473 536 × 2 = 0 + 0.000 000 000 000 000 034 947 072;
  • 13) 0.000 000 000 000 000 034 947 072 × 2 = 0 + 0.000 000 000 000 000 069 894 144;
  • 14) 0.000 000 000 000 000 069 894 144 × 2 = 0 + 0.000 000 000 000 000 139 788 288;
  • 15) 0.000 000 000 000 000 139 788 288 × 2 = 0 + 0.000 000 000 000 000 279 576 576;
  • 16) 0.000 000 000 000 000 279 576 576 × 2 = 0 + 0.000 000 000 000 000 559 153 152;
  • 17) 0.000 000 000 000 000 559 153 152 × 2 = 0 + 0.000 000 000 000 001 118 306 304;
  • 18) 0.000 000 000 000 001 118 306 304 × 2 = 0 + 0.000 000 000 000 002 236 612 608;
  • 19) 0.000 000 000 000 002 236 612 608 × 2 = 0 + 0.000 000 000 000 004 473 225 216;
  • 20) 0.000 000 000 000 004 473 225 216 × 2 = 0 + 0.000 000 000 000 008 946 450 432;
  • 21) 0.000 000 000 000 008 946 450 432 × 2 = 0 + 0.000 000 000 000 017 892 900 864;
  • 22) 0.000 000 000 000 017 892 900 864 × 2 = 0 + 0.000 000 000 000 035 785 801 728;
  • 23) 0.000 000 000 000 035 785 801 728 × 2 = 0 + 0.000 000 000 000 071 571 603 456;
  • 24) 0.000 000 000 000 071 571 603 456 × 2 = 0 + 0.000 000 000 000 143 143 206 912;
  • 25) 0.000 000 000 000 143 143 206 912 × 2 = 0 + 0.000 000 000 000 286 286 413 824;
  • 26) 0.000 000 000 000 286 286 413 824 × 2 = 0 + 0.000 000 000 000 572 572 827 648;
  • 27) 0.000 000 000 000 572 572 827 648 × 2 = 0 + 0.000 000 000 001 145 145 655 296;
  • 28) 0.000 000 000 001 145 145 655 296 × 2 = 0 + 0.000 000 000 002 290 291 310 592;
  • 29) 0.000 000 000 002 290 291 310 592 × 2 = 0 + 0.000 000 000 004 580 582 621 184;
  • 30) 0.000 000 000 004 580 582 621 184 × 2 = 0 + 0.000 000 000 009 161 165 242 368;
  • 31) 0.000 000 000 009 161 165 242 368 × 2 = 0 + 0.000 000 000 018 322 330 484 736;
  • 32) 0.000 000 000 018 322 330 484 736 × 2 = 0 + 0.000 000 000 036 644 660 969 472;
  • 33) 0.000 000 000 036 644 660 969 472 × 2 = 0 + 0.000 000 000 073 289 321 938 944;
  • 34) 0.000 000 000 073 289 321 938 944 × 2 = 0 + 0.000 000 000 146 578 643 877 888;
  • 35) 0.000 000 000 146 578 643 877 888 × 2 = 0 + 0.000 000 000 293 157 287 755 776;
  • 36) 0.000 000 000 293 157 287 755 776 × 2 = 0 + 0.000 000 000 586 314 575 511 552;
  • 37) 0.000 000 000 586 314 575 511 552 × 2 = 0 + 0.000 000 001 172 629 151 023 104;
  • 38) 0.000 000 001 172 629 151 023 104 × 2 = 0 + 0.000 000 002 345 258 302 046 208;
  • 39) 0.000 000 002 345 258 302 046 208 × 2 = 0 + 0.000 000 004 690 516 604 092 416;
  • 40) 0.000 000 004 690 516 604 092 416 × 2 = 0 + 0.000 000 009 381 033 208 184 832;
  • 41) 0.000 000 009 381 033 208 184 832 × 2 = 0 + 0.000 000 018 762 066 416 369 664;
  • 42) 0.000 000 018 762 066 416 369 664 × 2 = 0 + 0.000 000 037 524 132 832 739 328;
  • 43) 0.000 000 037 524 132 832 739 328 × 2 = 0 + 0.000 000 075 048 265 665 478 656;
  • 44) 0.000 000 075 048 265 665 478 656 × 2 = 0 + 0.000 000 150 096 531 330 957 312;
  • 45) 0.000 000 150 096 531 330 957 312 × 2 = 0 + 0.000 000 300 193 062 661 914 624;
  • 46) 0.000 000 300 193 062 661 914 624 × 2 = 0 + 0.000 000 600 386 125 323 829 248;
  • 47) 0.000 000 600 386 125 323 829 248 × 2 = 0 + 0.000 001 200 772 250 647 658 496;
  • 48) 0.000 001 200 772 250 647 658 496 × 2 = 0 + 0.000 002 401 544 501 295 316 992;
  • 49) 0.000 002 401 544 501 295 316 992 × 2 = 0 + 0.000 004 803 089 002 590 633 984;
  • 50) 0.000 004 803 089 002 590 633 984 × 2 = 0 + 0.000 009 606 178 005 181 267 968;
  • 51) 0.000 009 606 178 005 181 267 968 × 2 = 0 + 0.000 019 212 356 010 362 535 936;
  • 52) 0.000 019 212 356 010 362 535 936 × 2 = 0 + 0.000 038 424 712 020 725 071 872;
  • 53) 0.000 038 424 712 020 725 071 872 × 2 = 0 + 0.000 076 849 424 041 450 143 744;
  • 54) 0.000 076 849 424 041 450 143 744 × 2 = 0 + 0.000 153 698 848 082 900 287 488;
  • 55) 0.000 153 698 848 082 900 287 488 × 2 = 0 + 0.000 307 397 696 165 800 574 976;
  • 56) 0.000 307 397 696 165 800 574 976 × 2 = 0 + 0.000 614 795 392 331 601 149 952;
  • 57) 0.000 614 795 392 331 601 149 952 × 2 = 0 + 0.001 229 590 784 663 202 299 904;
  • 58) 0.001 229 590 784 663 202 299 904 × 2 = 0 + 0.002 459 181 569 326 404 599 808;
  • 59) 0.002 459 181 569 326 404 599 808 × 2 = 0 + 0.004 918 363 138 652 809 199 616;
  • 60) 0.004 918 363 138 652 809 199 616 × 2 = 0 + 0.009 836 726 277 305 618 399 232;
  • 61) 0.009 836 726 277 305 618 399 232 × 2 = 0 + 0.019 673 452 554 611 236 798 464;
  • 62) 0.019 673 452 554 611 236 798 464 × 2 = 0 + 0.039 346 905 109 222 473 596 928;
  • 63) 0.039 346 905 109 222 473 596 928 × 2 = 0 + 0.078 693 810 218 444 947 193 856;
  • 64) 0.078 693 810 218 444 947 193 856 × 2 = 0 + 0.157 387 620 436 889 894 387 712;
  • 65) 0.157 387 620 436 889 894 387 712 × 2 = 0 + 0.314 775 240 873 779 788 775 424;
  • 66) 0.314 775 240 873 779 788 775 424 × 2 = 0 + 0.629 550 481 747 559 577 550 848;
  • 67) 0.629 550 481 747 559 577 550 848 × 2 = 1 + 0.259 100 963 495 119 155 101 696;
  • 68) 0.259 100 963 495 119 155 101 696 × 2 = 0 + 0.518 201 926 990 238 310 203 392;
  • 69) 0.518 201 926 990 238 310 203 392 × 2 = 1 + 0.036 403 853 980 476 620 406 784;
  • 70) 0.036 403 853 980 476 620 406 784 × 2 = 0 + 0.072 807 707 960 953 240 813 568;
  • 71) 0.072 807 707 960 953 240 813 568 × 2 = 0 + 0.145 615 415 921 906 481 627 136;
  • 72) 0.145 615 415 921 906 481 627 136 × 2 = 0 + 0.291 230 831 843 812 963 254 272;
  • 73) 0.291 230 831 843 812 963 254 272 × 2 = 0 + 0.582 461 663 687 625 926 508 544;
  • 74) 0.582 461 663 687 625 926 508 544 × 2 = 1 + 0.164 923 327 375 251 853 017 088;
  • 75) 0.164 923 327 375 251 853 017 088 × 2 = 0 + 0.329 846 654 750 503 706 034 176;
  • 76) 0.329 846 654 750 503 706 034 176 × 2 = 0 + 0.659 693 309 501 007 412 068 352;
  • 77) 0.659 693 309 501 007 412 068 352 × 2 = 1 + 0.319 386 619 002 014 824 136 704;
  • 78) 0.319 386 619 002 014 824 136 704 × 2 = 0 + 0.638 773 238 004 029 648 273 408;
  • 79) 0.638 773 238 004 029 648 273 408 × 2 = 1 + 0.277 546 476 008 059 296 546 816;
  • 80) 0.277 546 476 008 059 296 546 816 × 2 = 0 + 0.555 092 952 016 118 593 093 632;
  • 81) 0.555 092 952 016 118 593 093 632 × 2 = 1 + 0.110 185 904 032 237 186 187 264;
  • 82) 0.110 185 904 032 237 186 187 264 × 2 = 0 + 0.220 371 808 064 474 372 374 528;
  • 83) 0.220 371 808 064 474 372 374 528 × 2 = 0 + 0.440 743 616 128 948 744 749 056;
  • 84) 0.440 743 616 128 948 744 749 056 × 2 = 0 + 0.881 487 232 257 897 489 498 112;
  • 85) 0.881 487 232 257 897 489 498 112 × 2 = 1 + 0.762 974 464 515 794 978 996 224;
  • 86) 0.762 974 464 515 794 978 996 224 × 2 = 1 + 0.525 948 929 031 589 957 992 448;
  • 87) 0.525 948 929 031 589 957 992 448 × 2 = 1 + 0.051 897 858 063 179 915 984 896;
  • 88) 0.051 897 858 063 179 915 984 896 × 2 = 0 + 0.103 795 716 126 359 831 969 792;
  • 89) 0.103 795 716 126 359 831 969 792 × 2 = 0 + 0.207 591 432 252 719 663 939 584;
  • 90) 0.207 591 432 252 719 663 939 584 × 2 = 0 + 0.415 182 864 505 439 327 879 168;
  • 91) 0.415 182 864 505 439 327 879 168 × 2 = 0 + 0.830 365 729 010 878 655 758 336;
  • 92) 0.830 365 729 010 878 655 758 336 × 2 = 1 + 0.660 731 458 021 757 311 516 672;
  • 93) 0.660 731 458 021 757 311 516 672 × 2 = 1 + 0.321 462 916 043 514 623 033 344;
  • 94) 0.321 462 916 043 514 623 033 344 × 2 = 0 + 0.642 925 832 087 029 246 066 688;
  • 95) 0.642 925 832 087 029 246 066 688 × 2 = 1 + 0.285 851 664 174 058 492 133 376;
  • 96) 0.285 851 664 174 058 492 133 376 × 2 = 0 + 0.571 703 328 348 116 984 266 752;
  • 97) 0.571 703 328 348 116 984 266 752 × 2 = 1 + 0.143 406 656 696 233 968 533 504;
  • 98) 0.143 406 656 696 233 968 533 504 × 2 = 0 + 0.286 813 313 392 467 937 067 008;
  • 99) 0.286 813 313 392 467 937 067 008 × 2 = 0 + 0.573 626 626 784 935 874 134 016;
  • 100) 0.573 626 626 784 935 874 134 016 × 2 = 1 + 0.147 253 253 569 871 748 268 032;
  • 101) 0.147 253 253 569 871 748 268 032 × 2 = 0 + 0.294 506 507 139 743 496 536 064;
  • 102) 0.294 506 507 139 743 496 536 064 × 2 = 0 + 0.589 013 014 279 486 993 072 128;
  • 103) 0.589 013 014 279 486 993 072 128 × 2 = 1 + 0.178 026 028 558 973 986 144 256;
  • 104) 0.178 026 028 558 973 986 144 256 × 2 = 0 + 0.356 052 057 117 947 972 288 512;
  • 105) 0.356 052 057 117 947 972 288 512 × 2 = 0 + 0.712 104 114 235 895 944 577 024;
  • 106) 0.712 104 114 235 895 944 577 024 × 2 = 1 + 0.424 208 228 471 791 889 154 048;
  • 107) 0.424 208 228 471 791 889 154 048 × 2 = 0 + 0.848 416 456 943 583 778 308 096;
  • 108) 0.848 416 456 943 583 778 308 096 × 2 = 1 + 0.696 832 913 887 167 556 616 192;
  • 109) 0.696 832 913 887 167 556 616 192 × 2 = 1 + 0.393 665 827 774 335 113 232 384;
  • 110) 0.393 665 827 774 335 113 232 384 × 2 = 0 + 0.787 331 655 548 670 226 464 768;
  • 111) 0.787 331 655 548 670 226 464 768 × 2 = 1 + 0.574 663 311 097 340 452 929 536;
  • 112) 0.574 663 311 097 340 452 929 536 × 2 = 1 + 0.149 326 622 194 680 905 859 072;
  • 113) 0.149 326 622 194 680 905 859 072 × 2 = 0 + 0.298 653 244 389 361 811 718 144;
  • 114) 0.298 653 244 389 361 811 718 144 × 2 = 0 + 0.597 306 488 778 723 623 436 288;
  • 115) 0.597 306 488 778 723 623 436 288 × 2 = 1 + 0.194 612 977 557 447 246 872 576;
  • 116) 0.194 612 977 557 447 246 872 576 × 2 = 0 + 0.389 225 955 114 894 493 745 152;
  • 117) 0.389 225 955 114 894 493 745 152 × 2 = 0 + 0.778 451 910 229 788 987 490 304;
  • 118) 0.778 451 910 229 788 987 490 304 × 2 = 1 + 0.556 903 820 459 577 974 980 608;
  • 119) 0.556 903 820 459 577 974 980 608 × 2 = 1 + 0.113 807 640 919 155 949 961 216;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 532(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1010 1000 1110 0001 1010 1001 0010 0101 1011 0010 011(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 532(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1010 1000 1110 0001 1010 1001 0010 0101 1011 0010 011(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 532(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1010 1000 1110 0001 1010 1001 0010 0101 1011 0010 011(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1010 1000 1110 0001 1010 1001 0010 0101 1011 0010 011(2) × 20 =


1.0100 0010 0101 0100 0111 0000 1101 0100 1001 0010 1101 1001 0011(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 0101 0100 0111 0000 1101 0100 1001 0010 1101 1001 0011


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 0101 0100 0111 0000 1101 0100 1001 0010 1101 1001 0011 =


0100 0010 0101 0100 0111 0000 1101 0100 1001 0010 1101 1001 0011


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 0101 0100 0111 0000 1101 0100 1001 0010 1101 1001 0011


Decimal number 0.000 000 000 000 000 000 008 532 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 0101 0100 0111 0000 1101 0100 1001 0010 1101 1001 0011


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100