0.000 000 000 000 000 000 008 456 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 456(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 456(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 456.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 456 × 2 = 0 + 0.000 000 000 000 000 000 016 912;
  • 2) 0.000 000 000 000 000 000 016 912 × 2 = 0 + 0.000 000 000 000 000 000 033 824;
  • 3) 0.000 000 000 000 000 000 033 824 × 2 = 0 + 0.000 000 000 000 000 000 067 648;
  • 4) 0.000 000 000 000 000 000 067 648 × 2 = 0 + 0.000 000 000 000 000 000 135 296;
  • 5) 0.000 000 000 000 000 000 135 296 × 2 = 0 + 0.000 000 000 000 000 000 270 592;
  • 6) 0.000 000 000 000 000 000 270 592 × 2 = 0 + 0.000 000 000 000 000 000 541 184;
  • 7) 0.000 000 000 000 000 000 541 184 × 2 = 0 + 0.000 000 000 000 000 001 082 368;
  • 8) 0.000 000 000 000 000 001 082 368 × 2 = 0 + 0.000 000 000 000 000 002 164 736;
  • 9) 0.000 000 000 000 000 002 164 736 × 2 = 0 + 0.000 000 000 000 000 004 329 472;
  • 10) 0.000 000 000 000 000 004 329 472 × 2 = 0 + 0.000 000 000 000 000 008 658 944;
  • 11) 0.000 000 000 000 000 008 658 944 × 2 = 0 + 0.000 000 000 000 000 017 317 888;
  • 12) 0.000 000 000 000 000 017 317 888 × 2 = 0 + 0.000 000 000 000 000 034 635 776;
  • 13) 0.000 000 000 000 000 034 635 776 × 2 = 0 + 0.000 000 000 000 000 069 271 552;
  • 14) 0.000 000 000 000 000 069 271 552 × 2 = 0 + 0.000 000 000 000 000 138 543 104;
  • 15) 0.000 000 000 000 000 138 543 104 × 2 = 0 + 0.000 000 000 000 000 277 086 208;
  • 16) 0.000 000 000 000 000 277 086 208 × 2 = 0 + 0.000 000 000 000 000 554 172 416;
  • 17) 0.000 000 000 000 000 554 172 416 × 2 = 0 + 0.000 000 000 000 001 108 344 832;
  • 18) 0.000 000 000 000 001 108 344 832 × 2 = 0 + 0.000 000 000 000 002 216 689 664;
  • 19) 0.000 000 000 000 002 216 689 664 × 2 = 0 + 0.000 000 000 000 004 433 379 328;
  • 20) 0.000 000 000 000 004 433 379 328 × 2 = 0 + 0.000 000 000 000 008 866 758 656;
  • 21) 0.000 000 000 000 008 866 758 656 × 2 = 0 + 0.000 000 000 000 017 733 517 312;
  • 22) 0.000 000 000 000 017 733 517 312 × 2 = 0 + 0.000 000 000 000 035 467 034 624;
  • 23) 0.000 000 000 000 035 467 034 624 × 2 = 0 + 0.000 000 000 000 070 934 069 248;
  • 24) 0.000 000 000 000 070 934 069 248 × 2 = 0 + 0.000 000 000 000 141 868 138 496;
  • 25) 0.000 000 000 000 141 868 138 496 × 2 = 0 + 0.000 000 000 000 283 736 276 992;
  • 26) 0.000 000 000 000 283 736 276 992 × 2 = 0 + 0.000 000 000 000 567 472 553 984;
  • 27) 0.000 000 000 000 567 472 553 984 × 2 = 0 + 0.000 000 000 001 134 945 107 968;
  • 28) 0.000 000 000 001 134 945 107 968 × 2 = 0 + 0.000 000 000 002 269 890 215 936;
  • 29) 0.000 000 000 002 269 890 215 936 × 2 = 0 + 0.000 000 000 004 539 780 431 872;
  • 30) 0.000 000 000 004 539 780 431 872 × 2 = 0 + 0.000 000 000 009 079 560 863 744;
  • 31) 0.000 000 000 009 079 560 863 744 × 2 = 0 + 0.000 000 000 018 159 121 727 488;
  • 32) 0.000 000 000 018 159 121 727 488 × 2 = 0 + 0.000 000 000 036 318 243 454 976;
  • 33) 0.000 000 000 036 318 243 454 976 × 2 = 0 + 0.000 000 000 072 636 486 909 952;
  • 34) 0.000 000 000 072 636 486 909 952 × 2 = 0 + 0.000 000 000 145 272 973 819 904;
  • 35) 0.000 000 000 145 272 973 819 904 × 2 = 0 + 0.000 000 000 290 545 947 639 808;
  • 36) 0.000 000 000 290 545 947 639 808 × 2 = 0 + 0.000 000 000 581 091 895 279 616;
  • 37) 0.000 000 000 581 091 895 279 616 × 2 = 0 + 0.000 000 001 162 183 790 559 232;
  • 38) 0.000 000 001 162 183 790 559 232 × 2 = 0 + 0.000 000 002 324 367 581 118 464;
  • 39) 0.000 000 002 324 367 581 118 464 × 2 = 0 + 0.000 000 004 648 735 162 236 928;
  • 40) 0.000 000 004 648 735 162 236 928 × 2 = 0 + 0.000 000 009 297 470 324 473 856;
  • 41) 0.000 000 009 297 470 324 473 856 × 2 = 0 + 0.000 000 018 594 940 648 947 712;
  • 42) 0.000 000 018 594 940 648 947 712 × 2 = 0 + 0.000 000 037 189 881 297 895 424;
  • 43) 0.000 000 037 189 881 297 895 424 × 2 = 0 + 0.000 000 074 379 762 595 790 848;
  • 44) 0.000 000 074 379 762 595 790 848 × 2 = 0 + 0.000 000 148 759 525 191 581 696;
  • 45) 0.000 000 148 759 525 191 581 696 × 2 = 0 + 0.000 000 297 519 050 383 163 392;
  • 46) 0.000 000 297 519 050 383 163 392 × 2 = 0 + 0.000 000 595 038 100 766 326 784;
  • 47) 0.000 000 595 038 100 766 326 784 × 2 = 0 + 0.000 001 190 076 201 532 653 568;
  • 48) 0.000 001 190 076 201 532 653 568 × 2 = 0 + 0.000 002 380 152 403 065 307 136;
  • 49) 0.000 002 380 152 403 065 307 136 × 2 = 0 + 0.000 004 760 304 806 130 614 272;
  • 50) 0.000 004 760 304 806 130 614 272 × 2 = 0 + 0.000 009 520 609 612 261 228 544;
  • 51) 0.000 009 520 609 612 261 228 544 × 2 = 0 + 0.000 019 041 219 224 522 457 088;
  • 52) 0.000 019 041 219 224 522 457 088 × 2 = 0 + 0.000 038 082 438 449 044 914 176;
  • 53) 0.000 038 082 438 449 044 914 176 × 2 = 0 + 0.000 076 164 876 898 089 828 352;
  • 54) 0.000 076 164 876 898 089 828 352 × 2 = 0 + 0.000 152 329 753 796 179 656 704;
  • 55) 0.000 152 329 753 796 179 656 704 × 2 = 0 + 0.000 304 659 507 592 359 313 408;
  • 56) 0.000 304 659 507 592 359 313 408 × 2 = 0 + 0.000 609 319 015 184 718 626 816;
  • 57) 0.000 609 319 015 184 718 626 816 × 2 = 0 + 0.001 218 638 030 369 437 253 632;
  • 58) 0.001 218 638 030 369 437 253 632 × 2 = 0 + 0.002 437 276 060 738 874 507 264;
  • 59) 0.002 437 276 060 738 874 507 264 × 2 = 0 + 0.004 874 552 121 477 749 014 528;
  • 60) 0.004 874 552 121 477 749 014 528 × 2 = 0 + 0.009 749 104 242 955 498 029 056;
  • 61) 0.009 749 104 242 955 498 029 056 × 2 = 0 + 0.019 498 208 485 910 996 058 112;
  • 62) 0.019 498 208 485 910 996 058 112 × 2 = 0 + 0.038 996 416 971 821 992 116 224;
  • 63) 0.038 996 416 971 821 992 116 224 × 2 = 0 + 0.077 992 833 943 643 984 232 448;
  • 64) 0.077 992 833 943 643 984 232 448 × 2 = 0 + 0.155 985 667 887 287 968 464 896;
  • 65) 0.155 985 667 887 287 968 464 896 × 2 = 0 + 0.311 971 335 774 575 936 929 792;
  • 66) 0.311 971 335 774 575 936 929 792 × 2 = 0 + 0.623 942 671 549 151 873 859 584;
  • 67) 0.623 942 671 549 151 873 859 584 × 2 = 1 + 0.247 885 343 098 303 747 719 168;
  • 68) 0.247 885 343 098 303 747 719 168 × 2 = 0 + 0.495 770 686 196 607 495 438 336;
  • 69) 0.495 770 686 196 607 495 438 336 × 2 = 0 + 0.991 541 372 393 214 990 876 672;
  • 70) 0.991 541 372 393 214 990 876 672 × 2 = 1 + 0.983 082 744 786 429 981 753 344;
  • 71) 0.983 082 744 786 429 981 753 344 × 2 = 1 + 0.966 165 489 572 859 963 506 688;
  • 72) 0.966 165 489 572 859 963 506 688 × 2 = 1 + 0.932 330 979 145 719 927 013 376;
  • 73) 0.932 330 979 145 719 927 013 376 × 2 = 1 + 0.864 661 958 291 439 854 026 752;
  • 74) 0.864 661 958 291 439 854 026 752 × 2 = 1 + 0.729 323 916 582 879 708 053 504;
  • 75) 0.729 323 916 582 879 708 053 504 × 2 = 1 + 0.458 647 833 165 759 416 107 008;
  • 76) 0.458 647 833 165 759 416 107 008 × 2 = 0 + 0.917 295 666 331 518 832 214 016;
  • 77) 0.917 295 666 331 518 832 214 016 × 2 = 1 + 0.834 591 332 663 037 664 428 032;
  • 78) 0.834 591 332 663 037 664 428 032 × 2 = 1 + 0.669 182 665 326 075 328 856 064;
  • 79) 0.669 182 665 326 075 328 856 064 × 2 = 1 + 0.338 365 330 652 150 657 712 128;
  • 80) 0.338 365 330 652 150 657 712 128 × 2 = 0 + 0.676 730 661 304 301 315 424 256;
  • 81) 0.676 730 661 304 301 315 424 256 × 2 = 1 + 0.353 461 322 608 602 630 848 512;
  • 82) 0.353 461 322 608 602 630 848 512 × 2 = 0 + 0.706 922 645 217 205 261 697 024;
  • 83) 0.706 922 645 217 205 261 697 024 × 2 = 1 + 0.413 845 290 434 410 523 394 048;
  • 84) 0.413 845 290 434 410 523 394 048 × 2 = 0 + 0.827 690 580 868 821 046 788 096;
  • 85) 0.827 690 580 868 821 046 788 096 × 2 = 1 + 0.655 381 161 737 642 093 576 192;
  • 86) 0.655 381 161 737 642 093 576 192 × 2 = 1 + 0.310 762 323 475 284 187 152 384;
  • 87) 0.310 762 323 475 284 187 152 384 × 2 = 0 + 0.621 524 646 950 568 374 304 768;
  • 88) 0.621 524 646 950 568 374 304 768 × 2 = 1 + 0.243 049 293 901 136 748 609 536;
  • 89) 0.243 049 293 901 136 748 609 536 × 2 = 0 + 0.486 098 587 802 273 497 219 072;
  • 90) 0.486 098 587 802 273 497 219 072 × 2 = 0 + 0.972 197 175 604 546 994 438 144;
  • 91) 0.972 197 175 604 546 994 438 144 × 2 = 1 + 0.944 394 351 209 093 988 876 288;
  • 92) 0.944 394 351 209 093 988 876 288 × 2 = 1 + 0.888 788 702 418 187 977 752 576;
  • 93) 0.888 788 702 418 187 977 752 576 × 2 = 1 + 0.777 577 404 836 375 955 505 152;
  • 94) 0.777 577 404 836 375 955 505 152 × 2 = 1 + 0.555 154 809 672 751 911 010 304;
  • 95) 0.555 154 809 672 751 911 010 304 × 2 = 1 + 0.110 309 619 345 503 822 020 608;
  • 96) 0.110 309 619 345 503 822 020 608 × 2 = 0 + 0.220 619 238 691 007 644 041 216;
  • 97) 0.220 619 238 691 007 644 041 216 × 2 = 0 + 0.441 238 477 382 015 288 082 432;
  • 98) 0.441 238 477 382 015 288 082 432 × 2 = 0 + 0.882 476 954 764 030 576 164 864;
  • 99) 0.882 476 954 764 030 576 164 864 × 2 = 1 + 0.764 953 909 528 061 152 329 728;
  • 100) 0.764 953 909 528 061 152 329 728 × 2 = 1 + 0.529 907 819 056 122 304 659 456;
  • 101) 0.529 907 819 056 122 304 659 456 × 2 = 1 + 0.059 815 638 112 244 609 318 912;
  • 102) 0.059 815 638 112 244 609 318 912 × 2 = 0 + 0.119 631 276 224 489 218 637 824;
  • 103) 0.119 631 276 224 489 218 637 824 × 2 = 0 + 0.239 262 552 448 978 437 275 648;
  • 104) 0.239 262 552 448 978 437 275 648 × 2 = 0 + 0.478 525 104 897 956 874 551 296;
  • 105) 0.478 525 104 897 956 874 551 296 × 2 = 0 + 0.957 050 209 795 913 749 102 592;
  • 106) 0.957 050 209 795 913 749 102 592 × 2 = 1 + 0.914 100 419 591 827 498 205 184;
  • 107) 0.914 100 419 591 827 498 205 184 × 2 = 1 + 0.828 200 839 183 654 996 410 368;
  • 108) 0.828 200 839 183 654 996 410 368 × 2 = 1 + 0.656 401 678 367 309 992 820 736;
  • 109) 0.656 401 678 367 309 992 820 736 × 2 = 1 + 0.312 803 356 734 619 985 641 472;
  • 110) 0.312 803 356 734 619 985 641 472 × 2 = 0 + 0.625 606 713 469 239 971 282 944;
  • 111) 0.625 606 713 469 239 971 282 944 × 2 = 1 + 0.251 213 426 938 479 942 565 888;
  • 112) 0.251 213 426 938 479 942 565 888 × 2 = 0 + 0.502 426 853 876 959 885 131 776;
  • 113) 0.502 426 853 876 959 885 131 776 × 2 = 1 + 0.004 853 707 753 919 770 263 552;
  • 114) 0.004 853 707 753 919 770 263 552 × 2 = 0 + 0.009 707 415 507 839 540 527 104;
  • 115) 0.009 707 415 507 839 540 527 104 × 2 = 0 + 0.019 414 831 015 679 081 054 208;
  • 116) 0.019 414 831 015 679 081 054 208 × 2 = 0 + 0.038 829 662 031 358 162 108 416;
  • 117) 0.038 829 662 031 358 162 108 416 × 2 = 0 + 0.077 659 324 062 716 324 216 832;
  • 118) 0.077 659 324 062 716 324 216 832 × 2 = 0 + 0.155 318 648 125 432 648 433 664;
  • 119) 0.155 318 648 125 432 648 433 664 × 2 = 0 + 0.310 637 296 250 865 296 867 328;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 456(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0111 1110 1110 1010 1101 0011 1110 0011 1000 0111 1010 1000 000(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 456(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0111 1110 1110 1010 1101 0011 1110 0011 1000 0111 1010 1000 000(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 456(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0111 1110 1110 1010 1101 0011 1110 0011 1000 0111 1010 1000 000(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0111 1110 1110 1010 1101 0011 1110 0011 1000 0111 1010 1000 000(2) × 20 =


1.0011 1111 0111 0101 0110 1001 1111 0001 1100 0011 1101 0100 0000(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0011 1111 0111 0101 0110 1001 1111 0001 1100 0011 1101 0100 0000


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0011 1111 0111 0101 0110 1001 1111 0001 1100 0011 1101 0100 0000 =


0011 1111 0111 0101 0110 1001 1111 0001 1100 0011 1101 0100 0000


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0011 1111 0111 0101 0110 1001 1111 0001 1100 0011 1101 0100 0000


Decimal number 0.000 000 000 000 000 000 008 456 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0011 1111 0111 0101 0110 1001 1111 0001 1100 0011 1101 0100 0000


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100