0.000 000 000 000 000 000 008 509 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 509(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 509(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 509.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 509 × 2 = 0 + 0.000 000 000 000 000 000 017 018;
  • 2) 0.000 000 000 000 000 000 017 018 × 2 = 0 + 0.000 000 000 000 000 000 034 036;
  • 3) 0.000 000 000 000 000 000 034 036 × 2 = 0 + 0.000 000 000 000 000 000 068 072;
  • 4) 0.000 000 000 000 000 000 068 072 × 2 = 0 + 0.000 000 000 000 000 000 136 144;
  • 5) 0.000 000 000 000 000 000 136 144 × 2 = 0 + 0.000 000 000 000 000 000 272 288;
  • 6) 0.000 000 000 000 000 000 272 288 × 2 = 0 + 0.000 000 000 000 000 000 544 576;
  • 7) 0.000 000 000 000 000 000 544 576 × 2 = 0 + 0.000 000 000 000 000 001 089 152;
  • 8) 0.000 000 000 000 000 001 089 152 × 2 = 0 + 0.000 000 000 000 000 002 178 304;
  • 9) 0.000 000 000 000 000 002 178 304 × 2 = 0 + 0.000 000 000 000 000 004 356 608;
  • 10) 0.000 000 000 000 000 004 356 608 × 2 = 0 + 0.000 000 000 000 000 008 713 216;
  • 11) 0.000 000 000 000 000 008 713 216 × 2 = 0 + 0.000 000 000 000 000 017 426 432;
  • 12) 0.000 000 000 000 000 017 426 432 × 2 = 0 + 0.000 000 000 000 000 034 852 864;
  • 13) 0.000 000 000 000 000 034 852 864 × 2 = 0 + 0.000 000 000 000 000 069 705 728;
  • 14) 0.000 000 000 000 000 069 705 728 × 2 = 0 + 0.000 000 000 000 000 139 411 456;
  • 15) 0.000 000 000 000 000 139 411 456 × 2 = 0 + 0.000 000 000 000 000 278 822 912;
  • 16) 0.000 000 000 000 000 278 822 912 × 2 = 0 + 0.000 000 000 000 000 557 645 824;
  • 17) 0.000 000 000 000 000 557 645 824 × 2 = 0 + 0.000 000 000 000 001 115 291 648;
  • 18) 0.000 000 000 000 001 115 291 648 × 2 = 0 + 0.000 000 000 000 002 230 583 296;
  • 19) 0.000 000 000 000 002 230 583 296 × 2 = 0 + 0.000 000 000 000 004 461 166 592;
  • 20) 0.000 000 000 000 004 461 166 592 × 2 = 0 + 0.000 000 000 000 008 922 333 184;
  • 21) 0.000 000 000 000 008 922 333 184 × 2 = 0 + 0.000 000 000 000 017 844 666 368;
  • 22) 0.000 000 000 000 017 844 666 368 × 2 = 0 + 0.000 000 000 000 035 689 332 736;
  • 23) 0.000 000 000 000 035 689 332 736 × 2 = 0 + 0.000 000 000 000 071 378 665 472;
  • 24) 0.000 000 000 000 071 378 665 472 × 2 = 0 + 0.000 000 000 000 142 757 330 944;
  • 25) 0.000 000 000 000 142 757 330 944 × 2 = 0 + 0.000 000 000 000 285 514 661 888;
  • 26) 0.000 000 000 000 285 514 661 888 × 2 = 0 + 0.000 000 000 000 571 029 323 776;
  • 27) 0.000 000 000 000 571 029 323 776 × 2 = 0 + 0.000 000 000 001 142 058 647 552;
  • 28) 0.000 000 000 001 142 058 647 552 × 2 = 0 + 0.000 000 000 002 284 117 295 104;
  • 29) 0.000 000 000 002 284 117 295 104 × 2 = 0 + 0.000 000 000 004 568 234 590 208;
  • 30) 0.000 000 000 004 568 234 590 208 × 2 = 0 + 0.000 000 000 009 136 469 180 416;
  • 31) 0.000 000 000 009 136 469 180 416 × 2 = 0 + 0.000 000 000 018 272 938 360 832;
  • 32) 0.000 000 000 018 272 938 360 832 × 2 = 0 + 0.000 000 000 036 545 876 721 664;
  • 33) 0.000 000 000 036 545 876 721 664 × 2 = 0 + 0.000 000 000 073 091 753 443 328;
  • 34) 0.000 000 000 073 091 753 443 328 × 2 = 0 + 0.000 000 000 146 183 506 886 656;
  • 35) 0.000 000 000 146 183 506 886 656 × 2 = 0 + 0.000 000 000 292 367 013 773 312;
  • 36) 0.000 000 000 292 367 013 773 312 × 2 = 0 + 0.000 000 000 584 734 027 546 624;
  • 37) 0.000 000 000 584 734 027 546 624 × 2 = 0 + 0.000 000 001 169 468 055 093 248;
  • 38) 0.000 000 001 169 468 055 093 248 × 2 = 0 + 0.000 000 002 338 936 110 186 496;
  • 39) 0.000 000 002 338 936 110 186 496 × 2 = 0 + 0.000 000 004 677 872 220 372 992;
  • 40) 0.000 000 004 677 872 220 372 992 × 2 = 0 + 0.000 000 009 355 744 440 745 984;
  • 41) 0.000 000 009 355 744 440 745 984 × 2 = 0 + 0.000 000 018 711 488 881 491 968;
  • 42) 0.000 000 018 711 488 881 491 968 × 2 = 0 + 0.000 000 037 422 977 762 983 936;
  • 43) 0.000 000 037 422 977 762 983 936 × 2 = 0 + 0.000 000 074 845 955 525 967 872;
  • 44) 0.000 000 074 845 955 525 967 872 × 2 = 0 + 0.000 000 149 691 911 051 935 744;
  • 45) 0.000 000 149 691 911 051 935 744 × 2 = 0 + 0.000 000 299 383 822 103 871 488;
  • 46) 0.000 000 299 383 822 103 871 488 × 2 = 0 + 0.000 000 598 767 644 207 742 976;
  • 47) 0.000 000 598 767 644 207 742 976 × 2 = 0 + 0.000 001 197 535 288 415 485 952;
  • 48) 0.000 001 197 535 288 415 485 952 × 2 = 0 + 0.000 002 395 070 576 830 971 904;
  • 49) 0.000 002 395 070 576 830 971 904 × 2 = 0 + 0.000 004 790 141 153 661 943 808;
  • 50) 0.000 004 790 141 153 661 943 808 × 2 = 0 + 0.000 009 580 282 307 323 887 616;
  • 51) 0.000 009 580 282 307 323 887 616 × 2 = 0 + 0.000 019 160 564 614 647 775 232;
  • 52) 0.000 019 160 564 614 647 775 232 × 2 = 0 + 0.000 038 321 129 229 295 550 464;
  • 53) 0.000 038 321 129 229 295 550 464 × 2 = 0 + 0.000 076 642 258 458 591 100 928;
  • 54) 0.000 076 642 258 458 591 100 928 × 2 = 0 + 0.000 153 284 516 917 182 201 856;
  • 55) 0.000 153 284 516 917 182 201 856 × 2 = 0 + 0.000 306 569 033 834 364 403 712;
  • 56) 0.000 306 569 033 834 364 403 712 × 2 = 0 + 0.000 613 138 067 668 728 807 424;
  • 57) 0.000 613 138 067 668 728 807 424 × 2 = 0 + 0.001 226 276 135 337 457 614 848;
  • 58) 0.001 226 276 135 337 457 614 848 × 2 = 0 + 0.002 452 552 270 674 915 229 696;
  • 59) 0.002 452 552 270 674 915 229 696 × 2 = 0 + 0.004 905 104 541 349 830 459 392;
  • 60) 0.004 905 104 541 349 830 459 392 × 2 = 0 + 0.009 810 209 082 699 660 918 784;
  • 61) 0.009 810 209 082 699 660 918 784 × 2 = 0 + 0.019 620 418 165 399 321 837 568;
  • 62) 0.019 620 418 165 399 321 837 568 × 2 = 0 + 0.039 240 836 330 798 643 675 136;
  • 63) 0.039 240 836 330 798 643 675 136 × 2 = 0 + 0.078 481 672 661 597 287 350 272;
  • 64) 0.078 481 672 661 597 287 350 272 × 2 = 0 + 0.156 963 345 323 194 574 700 544;
  • 65) 0.156 963 345 323 194 574 700 544 × 2 = 0 + 0.313 926 690 646 389 149 401 088;
  • 66) 0.313 926 690 646 389 149 401 088 × 2 = 0 + 0.627 853 381 292 778 298 802 176;
  • 67) 0.627 853 381 292 778 298 802 176 × 2 = 1 + 0.255 706 762 585 556 597 604 352;
  • 68) 0.255 706 762 585 556 597 604 352 × 2 = 0 + 0.511 413 525 171 113 195 208 704;
  • 69) 0.511 413 525 171 113 195 208 704 × 2 = 1 + 0.022 827 050 342 226 390 417 408;
  • 70) 0.022 827 050 342 226 390 417 408 × 2 = 0 + 0.045 654 100 684 452 780 834 816;
  • 71) 0.045 654 100 684 452 780 834 816 × 2 = 0 + 0.091 308 201 368 905 561 669 632;
  • 72) 0.091 308 201 368 905 561 669 632 × 2 = 0 + 0.182 616 402 737 811 123 339 264;
  • 73) 0.182 616 402 737 811 123 339 264 × 2 = 0 + 0.365 232 805 475 622 246 678 528;
  • 74) 0.365 232 805 475 622 246 678 528 × 2 = 0 + 0.730 465 610 951 244 493 357 056;
  • 75) 0.730 465 610 951 244 493 357 056 × 2 = 1 + 0.460 931 221 902 488 986 714 112;
  • 76) 0.460 931 221 902 488 986 714 112 × 2 = 0 + 0.921 862 443 804 977 973 428 224;
  • 77) 0.921 862 443 804 977 973 428 224 × 2 = 1 + 0.843 724 887 609 955 946 856 448;
  • 78) 0.843 724 887 609 955 946 856 448 × 2 = 1 + 0.687 449 775 219 911 893 712 896;
  • 79) 0.687 449 775 219 911 893 712 896 × 2 = 1 + 0.374 899 550 439 823 787 425 792;
  • 80) 0.374 899 550 439 823 787 425 792 × 2 = 0 + 0.749 799 100 879 647 574 851 584;
  • 81) 0.749 799 100 879 647 574 851 584 × 2 = 1 + 0.499 598 201 759 295 149 703 168;
  • 82) 0.499 598 201 759 295 149 703 168 × 2 = 0 + 0.999 196 403 518 590 299 406 336;
  • 83) 0.999 196 403 518 590 299 406 336 × 2 = 1 + 0.998 392 807 037 180 598 812 672;
  • 84) 0.998 392 807 037 180 598 812 672 × 2 = 1 + 0.996 785 614 074 361 197 625 344;
  • 85) 0.996 785 614 074 361 197 625 344 × 2 = 1 + 0.993 571 228 148 722 395 250 688;
  • 86) 0.993 571 228 148 722 395 250 688 × 2 = 1 + 0.987 142 456 297 444 790 501 376;
  • 87) 0.987 142 456 297 444 790 501 376 × 2 = 1 + 0.974 284 912 594 889 581 002 752;
  • 88) 0.974 284 912 594 889 581 002 752 × 2 = 1 + 0.948 569 825 189 779 162 005 504;
  • 89) 0.948 569 825 189 779 162 005 504 × 2 = 1 + 0.897 139 650 379 558 324 011 008;
  • 90) 0.897 139 650 379 558 324 011 008 × 2 = 1 + 0.794 279 300 759 116 648 022 016;
  • 91) 0.794 279 300 759 116 648 022 016 × 2 = 1 + 0.588 558 601 518 233 296 044 032;
  • 92) 0.588 558 601 518 233 296 044 032 × 2 = 1 + 0.177 117 203 036 466 592 088 064;
  • 93) 0.177 117 203 036 466 592 088 064 × 2 = 0 + 0.354 234 406 072 933 184 176 128;
  • 94) 0.354 234 406 072 933 184 176 128 × 2 = 0 + 0.708 468 812 145 866 368 352 256;
  • 95) 0.708 468 812 145 866 368 352 256 × 2 = 1 + 0.416 937 624 291 732 736 704 512;
  • 96) 0.416 937 624 291 732 736 704 512 × 2 = 0 + 0.833 875 248 583 465 473 409 024;
  • 97) 0.833 875 248 583 465 473 409 024 × 2 = 1 + 0.667 750 497 166 930 946 818 048;
  • 98) 0.667 750 497 166 930 946 818 048 × 2 = 1 + 0.335 500 994 333 861 893 636 096;
  • 99) 0.335 500 994 333 861 893 636 096 × 2 = 0 + 0.671 001 988 667 723 787 272 192;
  • 100) 0.671 001 988 667 723 787 272 192 × 2 = 1 + 0.342 003 977 335 447 574 544 384;
  • 101) 0.342 003 977 335 447 574 544 384 × 2 = 0 + 0.684 007 954 670 895 149 088 768;
  • 102) 0.684 007 954 670 895 149 088 768 × 2 = 1 + 0.368 015 909 341 790 298 177 536;
  • 103) 0.368 015 909 341 790 298 177 536 × 2 = 0 + 0.736 031 818 683 580 596 355 072;
  • 104) 0.736 031 818 683 580 596 355 072 × 2 = 1 + 0.472 063 637 367 161 192 710 144;
  • 105) 0.472 063 637 367 161 192 710 144 × 2 = 0 + 0.944 127 274 734 322 385 420 288;
  • 106) 0.944 127 274 734 322 385 420 288 × 2 = 1 + 0.888 254 549 468 644 770 840 576;
  • 107) 0.888 254 549 468 644 770 840 576 × 2 = 1 + 0.776 509 098 937 289 541 681 152;
  • 108) 0.776 509 098 937 289 541 681 152 × 2 = 1 + 0.553 018 197 874 579 083 362 304;
  • 109) 0.553 018 197 874 579 083 362 304 × 2 = 1 + 0.106 036 395 749 158 166 724 608;
  • 110) 0.106 036 395 749 158 166 724 608 × 2 = 0 + 0.212 072 791 498 316 333 449 216;
  • 111) 0.212 072 791 498 316 333 449 216 × 2 = 0 + 0.424 145 582 996 632 666 898 432;
  • 112) 0.424 145 582 996 632 666 898 432 × 2 = 0 + 0.848 291 165 993 265 333 796 864;
  • 113) 0.848 291 165 993 265 333 796 864 × 2 = 1 + 0.696 582 331 986 530 667 593 728;
  • 114) 0.696 582 331 986 530 667 593 728 × 2 = 1 + 0.393 164 663 973 061 335 187 456;
  • 115) 0.393 164 663 973 061 335 187 456 × 2 = 0 + 0.786 329 327 946 122 670 374 912;
  • 116) 0.786 329 327 946 122 670 374 912 × 2 = 1 + 0.572 658 655 892 245 340 749 824;
  • 117) 0.572 658 655 892 245 340 749 824 × 2 = 1 + 0.145 317 311 784 490 681 499 648;
  • 118) 0.145 317 311 784 490 681 499 648 × 2 = 0 + 0.290 634 623 568 981 362 999 296;
  • 119) 0.290 634 623 568 981 362 999 296 × 2 = 0 + 0.581 269 247 137 962 725 998 592;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 509(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0010 1110 1011 1111 1111 0010 1101 0101 0111 1000 1101 100(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 509(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0010 1110 1011 1111 1111 0010 1101 0101 0111 1000 1101 100(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 509(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0010 1110 1011 1111 1111 0010 1101 0101 0111 1000 1101 100(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0010 1110 1011 1111 1111 0010 1101 0101 0111 1000 1101 100(2) × 20 =


1.0100 0001 0111 0101 1111 1111 1001 0110 1010 1011 1100 0110 1100(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0001 0111 0101 1111 1111 1001 0110 1010 1011 1100 0110 1100


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0001 0111 0101 1111 1111 1001 0110 1010 1011 1100 0110 1100 =


0100 0001 0111 0101 1111 1111 1001 0110 1010 1011 1100 0110 1100


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0001 0111 0101 1111 1111 1001 0110 1010 1011 1100 0110 1100


Decimal number 0.000 000 000 000 000 000 008 509 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0001 0111 0101 1111 1111 1001 0110 1010 1011 1100 0110 1100


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100