0.000 000 000 000 000 000 008 529 6 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 529 6(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 529 6(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 529 6.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 529 6 × 2 = 0 + 0.000 000 000 000 000 000 017 059 2;
  • 2) 0.000 000 000 000 000 000 017 059 2 × 2 = 0 + 0.000 000 000 000 000 000 034 118 4;
  • 3) 0.000 000 000 000 000 000 034 118 4 × 2 = 0 + 0.000 000 000 000 000 000 068 236 8;
  • 4) 0.000 000 000 000 000 000 068 236 8 × 2 = 0 + 0.000 000 000 000 000 000 136 473 6;
  • 5) 0.000 000 000 000 000 000 136 473 6 × 2 = 0 + 0.000 000 000 000 000 000 272 947 2;
  • 6) 0.000 000 000 000 000 000 272 947 2 × 2 = 0 + 0.000 000 000 000 000 000 545 894 4;
  • 7) 0.000 000 000 000 000 000 545 894 4 × 2 = 0 + 0.000 000 000 000 000 001 091 788 8;
  • 8) 0.000 000 000 000 000 001 091 788 8 × 2 = 0 + 0.000 000 000 000 000 002 183 577 6;
  • 9) 0.000 000 000 000 000 002 183 577 6 × 2 = 0 + 0.000 000 000 000 000 004 367 155 2;
  • 10) 0.000 000 000 000 000 004 367 155 2 × 2 = 0 + 0.000 000 000 000 000 008 734 310 4;
  • 11) 0.000 000 000 000 000 008 734 310 4 × 2 = 0 + 0.000 000 000 000 000 017 468 620 8;
  • 12) 0.000 000 000 000 000 017 468 620 8 × 2 = 0 + 0.000 000 000 000 000 034 937 241 6;
  • 13) 0.000 000 000 000 000 034 937 241 6 × 2 = 0 + 0.000 000 000 000 000 069 874 483 2;
  • 14) 0.000 000 000 000 000 069 874 483 2 × 2 = 0 + 0.000 000 000 000 000 139 748 966 4;
  • 15) 0.000 000 000 000 000 139 748 966 4 × 2 = 0 + 0.000 000 000 000 000 279 497 932 8;
  • 16) 0.000 000 000 000 000 279 497 932 8 × 2 = 0 + 0.000 000 000 000 000 558 995 865 6;
  • 17) 0.000 000 000 000 000 558 995 865 6 × 2 = 0 + 0.000 000 000 000 001 117 991 731 2;
  • 18) 0.000 000 000 000 001 117 991 731 2 × 2 = 0 + 0.000 000 000 000 002 235 983 462 4;
  • 19) 0.000 000 000 000 002 235 983 462 4 × 2 = 0 + 0.000 000 000 000 004 471 966 924 8;
  • 20) 0.000 000 000 000 004 471 966 924 8 × 2 = 0 + 0.000 000 000 000 008 943 933 849 6;
  • 21) 0.000 000 000 000 008 943 933 849 6 × 2 = 0 + 0.000 000 000 000 017 887 867 699 2;
  • 22) 0.000 000 000 000 017 887 867 699 2 × 2 = 0 + 0.000 000 000 000 035 775 735 398 4;
  • 23) 0.000 000 000 000 035 775 735 398 4 × 2 = 0 + 0.000 000 000 000 071 551 470 796 8;
  • 24) 0.000 000 000 000 071 551 470 796 8 × 2 = 0 + 0.000 000 000 000 143 102 941 593 6;
  • 25) 0.000 000 000 000 143 102 941 593 6 × 2 = 0 + 0.000 000 000 000 286 205 883 187 2;
  • 26) 0.000 000 000 000 286 205 883 187 2 × 2 = 0 + 0.000 000 000 000 572 411 766 374 4;
  • 27) 0.000 000 000 000 572 411 766 374 4 × 2 = 0 + 0.000 000 000 001 144 823 532 748 8;
  • 28) 0.000 000 000 001 144 823 532 748 8 × 2 = 0 + 0.000 000 000 002 289 647 065 497 6;
  • 29) 0.000 000 000 002 289 647 065 497 6 × 2 = 0 + 0.000 000 000 004 579 294 130 995 2;
  • 30) 0.000 000 000 004 579 294 130 995 2 × 2 = 0 + 0.000 000 000 009 158 588 261 990 4;
  • 31) 0.000 000 000 009 158 588 261 990 4 × 2 = 0 + 0.000 000 000 018 317 176 523 980 8;
  • 32) 0.000 000 000 018 317 176 523 980 8 × 2 = 0 + 0.000 000 000 036 634 353 047 961 6;
  • 33) 0.000 000 000 036 634 353 047 961 6 × 2 = 0 + 0.000 000 000 073 268 706 095 923 2;
  • 34) 0.000 000 000 073 268 706 095 923 2 × 2 = 0 + 0.000 000 000 146 537 412 191 846 4;
  • 35) 0.000 000 000 146 537 412 191 846 4 × 2 = 0 + 0.000 000 000 293 074 824 383 692 8;
  • 36) 0.000 000 000 293 074 824 383 692 8 × 2 = 0 + 0.000 000 000 586 149 648 767 385 6;
  • 37) 0.000 000 000 586 149 648 767 385 6 × 2 = 0 + 0.000 000 001 172 299 297 534 771 2;
  • 38) 0.000 000 001 172 299 297 534 771 2 × 2 = 0 + 0.000 000 002 344 598 595 069 542 4;
  • 39) 0.000 000 002 344 598 595 069 542 4 × 2 = 0 + 0.000 000 004 689 197 190 139 084 8;
  • 40) 0.000 000 004 689 197 190 139 084 8 × 2 = 0 + 0.000 000 009 378 394 380 278 169 6;
  • 41) 0.000 000 009 378 394 380 278 169 6 × 2 = 0 + 0.000 000 018 756 788 760 556 339 2;
  • 42) 0.000 000 018 756 788 760 556 339 2 × 2 = 0 + 0.000 000 037 513 577 521 112 678 4;
  • 43) 0.000 000 037 513 577 521 112 678 4 × 2 = 0 + 0.000 000 075 027 155 042 225 356 8;
  • 44) 0.000 000 075 027 155 042 225 356 8 × 2 = 0 + 0.000 000 150 054 310 084 450 713 6;
  • 45) 0.000 000 150 054 310 084 450 713 6 × 2 = 0 + 0.000 000 300 108 620 168 901 427 2;
  • 46) 0.000 000 300 108 620 168 901 427 2 × 2 = 0 + 0.000 000 600 217 240 337 802 854 4;
  • 47) 0.000 000 600 217 240 337 802 854 4 × 2 = 0 + 0.000 001 200 434 480 675 605 708 8;
  • 48) 0.000 001 200 434 480 675 605 708 8 × 2 = 0 + 0.000 002 400 868 961 351 211 417 6;
  • 49) 0.000 002 400 868 961 351 211 417 6 × 2 = 0 + 0.000 004 801 737 922 702 422 835 2;
  • 50) 0.000 004 801 737 922 702 422 835 2 × 2 = 0 + 0.000 009 603 475 845 404 845 670 4;
  • 51) 0.000 009 603 475 845 404 845 670 4 × 2 = 0 + 0.000 019 206 951 690 809 691 340 8;
  • 52) 0.000 019 206 951 690 809 691 340 8 × 2 = 0 + 0.000 038 413 903 381 619 382 681 6;
  • 53) 0.000 038 413 903 381 619 382 681 6 × 2 = 0 + 0.000 076 827 806 763 238 765 363 2;
  • 54) 0.000 076 827 806 763 238 765 363 2 × 2 = 0 + 0.000 153 655 613 526 477 530 726 4;
  • 55) 0.000 153 655 613 526 477 530 726 4 × 2 = 0 + 0.000 307 311 227 052 955 061 452 8;
  • 56) 0.000 307 311 227 052 955 061 452 8 × 2 = 0 + 0.000 614 622 454 105 910 122 905 6;
  • 57) 0.000 614 622 454 105 910 122 905 6 × 2 = 0 + 0.001 229 244 908 211 820 245 811 2;
  • 58) 0.001 229 244 908 211 820 245 811 2 × 2 = 0 + 0.002 458 489 816 423 640 491 622 4;
  • 59) 0.002 458 489 816 423 640 491 622 4 × 2 = 0 + 0.004 916 979 632 847 280 983 244 8;
  • 60) 0.004 916 979 632 847 280 983 244 8 × 2 = 0 + 0.009 833 959 265 694 561 966 489 6;
  • 61) 0.009 833 959 265 694 561 966 489 6 × 2 = 0 + 0.019 667 918 531 389 123 932 979 2;
  • 62) 0.019 667 918 531 389 123 932 979 2 × 2 = 0 + 0.039 335 837 062 778 247 865 958 4;
  • 63) 0.039 335 837 062 778 247 865 958 4 × 2 = 0 + 0.078 671 674 125 556 495 731 916 8;
  • 64) 0.078 671 674 125 556 495 731 916 8 × 2 = 0 + 0.157 343 348 251 112 991 463 833 6;
  • 65) 0.157 343 348 251 112 991 463 833 6 × 2 = 0 + 0.314 686 696 502 225 982 927 667 2;
  • 66) 0.314 686 696 502 225 982 927 667 2 × 2 = 0 + 0.629 373 393 004 451 965 855 334 4;
  • 67) 0.629 373 393 004 451 965 855 334 4 × 2 = 1 + 0.258 746 786 008 903 931 710 668 8;
  • 68) 0.258 746 786 008 903 931 710 668 8 × 2 = 0 + 0.517 493 572 017 807 863 421 337 6;
  • 69) 0.517 493 572 017 807 863 421 337 6 × 2 = 1 + 0.034 987 144 035 615 726 842 675 2;
  • 70) 0.034 987 144 035 615 726 842 675 2 × 2 = 0 + 0.069 974 288 071 231 453 685 350 4;
  • 71) 0.069 974 288 071 231 453 685 350 4 × 2 = 0 + 0.139 948 576 142 462 907 370 700 8;
  • 72) 0.139 948 576 142 462 907 370 700 8 × 2 = 0 + 0.279 897 152 284 925 814 741 401 6;
  • 73) 0.279 897 152 284 925 814 741 401 6 × 2 = 0 + 0.559 794 304 569 851 629 482 803 2;
  • 74) 0.559 794 304 569 851 629 482 803 2 × 2 = 1 + 0.119 588 609 139 703 258 965 606 4;
  • 75) 0.119 588 609 139 703 258 965 606 4 × 2 = 0 + 0.239 177 218 279 406 517 931 212 8;
  • 76) 0.239 177 218 279 406 517 931 212 8 × 2 = 0 + 0.478 354 436 558 813 035 862 425 6;
  • 77) 0.478 354 436 558 813 035 862 425 6 × 2 = 0 + 0.956 708 873 117 626 071 724 851 2;
  • 78) 0.956 708 873 117 626 071 724 851 2 × 2 = 1 + 0.913 417 746 235 252 143 449 702 4;
  • 79) 0.913 417 746 235 252 143 449 702 4 × 2 = 1 + 0.826 835 492 470 504 286 899 404 8;
  • 80) 0.826 835 492 470 504 286 899 404 8 × 2 = 1 + 0.653 670 984 941 008 573 798 809 6;
  • 81) 0.653 670 984 941 008 573 798 809 6 × 2 = 1 + 0.307 341 969 882 017 147 597 619 2;
  • 82) 0.307 341 969 882 017 147 597 619 2 × 2 = 0 + 0.614 683 939 764 034 295 195 238 4;
  • 83) 0.614 683 939 764 034 295 195 238 4 × 2 = 1 + 0.229 367 879 528 068 590 390 476 8;
  • 84) 0.229 367 879 528 068 590 390 476 8 × 2 = 0 + 0.458 735 759 056 137 180 780 953 6;
  • 85) 0.458 735 759 056 137 180 780 953 6 × 2 = 0 + 0.917 471 518 112 274 361 561 907 2;
  • 86) 0.917 471 518 112 274 361 561 907 2 × 2 = 1 + 0.834 943 036 224 548 723 123 814 4;
  • 87) 0.834 943 036 224 548 723 123 814 4 × 2 = 1 + 0.669 886 072 449 097 446 247 628 8;
  • 88) 0.669 886 072 449 097 446 247 628 8 × 2 = 1 + 0.339 772 144 898 194 892 495 257 6;
  • 89) 0.339 772 144 898 194 892 495 257 6 × 2 = 0 + 0.679 544 289 796 389 784 990 515 2;
  • 90) 0.679 544 289 796 389 784 990 515 2 × 2 = 1 + 0.359 088 579 592 779 569 981 030 4;
  • 91) 0.359 088 579 592 779 569 981 030 4 × 2 = 0 + 0.718 177 159 185 559 139 962 060 8;
  • 92) 0.718 177 159 185 559 139 962 060 8 × 2 = 1 + 0.436 354 318 371 118 279 924 121 6;
  • 93) 0.436 354 318 371 118 279 924 121 6 × 2 = 0 + 0.872 708 636 742 236 559 848 243 2;
  • 94) 0.872 708 636 742 236 559 848 243 2 × 2 = 1 + 0.745 417 273 484 473 119 696 486 4;
  • 95) 0.745 417 273 484 473 119 696 486 4 × 2 = 1 + 0.490 834 546 968 946 239 392 972 8;
  • 96) 0.490 834 546 968 946 239 392 972 8 × 2 = 0 + 0.981 669 093 937 892 478 785 945 6;
  • 97) 0.981 669 093 937 892 478 785 945 6 × 2 = 1 + 0.963 338 187 875 784 957 571 891 2;
  • 98) 0.963 338 187 875 784 957 571 891 2 × 2 = 1 + 0.926 676 375 751 569 915 143 782 4;
  • 99) 0.926 676 375 751 569 915 143 782 4 × 2 = 1 + 0.853 352 751 503 139 830 287 564 8;
  • 100) 0.853 352 751 503 139 830 287 564 8 × 2 = 1 + 0.706 705 503 006 279 660 575 129 6;
  • 101) 0.706 705 503 006 279 660 575 129 6 × 2 = 1 + 0.413 411 006 012 559 321 150 259 2;
  • 102) 0.413 411 006 012 559 321 150 259 2 × 2 = 0 + 0.826 822 012 025 118 642 300 518 4;
  • 103) 0.826 822 012 025 118 642 300 518 4 × 2 = 1 + 0.653 644 024 050 237 284 601 036 8;
  • 104) 0.653 644 024 050 237 284 601 036 8 × 2 = 1 + 0.307 288 048 100 474 569 202 073 6;
  • 105) 0.307 288 048 100 474 569 202 073 6 × 2 = 0 + 0.614 576 096 200 949 138 404 147 2;
  • 106) 0.614 576 096 200 949 138 404 147 2 × 2 = 1 + 0.229 152 192 401 898 276 808 294 4;
  • 107) 0.229 152 192 401 898 276 808 294 4 × 2 = 0 + 0.458 304 384 803 796 553 616 588 8;
  • 108) 0.458 304 384 803 796 553 616 588 8 × 2 = 0 + 0.916 608 769 607 593 107 233 177 6;
  • 109) 0.916 608 769 607 593 107 233 177 6 × 2 = 1 + 0.833 217 539 215 186 214 466 355 2;
  • 110) 0.833 217 539 215 186 214 466 355 2 × 2 = 1 + 0.666 435 078 430 372 428 932 710 4;
  • 111) 0.666 435 078 430 372 428 932 710 4 × 2 = 1 + 0.332 870 156 860 744 857 865 420 8;
  • 112) 0.332 870 156 860 744 857 865 420 8 × 2 = 0 + 0.665 740 313 721 489 715 730 841 6;
  • 113) 0.665 740 313 721 489 715 730 841 6 × 2 = 1 + 0.331 480 627 442 979 431 461 683 2;
  • 114) 0.331 480 627 442 979 431 461 683 2 × 2 = 0 + 0.662 961 254 885 958 862 923 366 4;
  • 115) 0.662 961 254 885 958 862 923 366 4 × 2 = 1 + 0.325 922 509 771 917 725 846 732 8;
  • 116) 0.325 922 509 771 917 725 846 732 8 × 2 = 0 + 0.651 845 019 543 835 451 693 465 6;
  • 117) 0.651 845 019 543 835 451 693 465 6 × 2 = 1 + 0.303 690 039 087 670 903 386 931 2;
  • 118) 0.303 690 039 087 670 903 386 931 2 × 2 = 0 + 0.607 380 078 175 341 806 773 862 4;
  • 119) 0.607 380 078 175 341 806 773 862 4 × 2 = 1 + 0.214 760 156 350 683 613 547 724 8;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 529 6(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 0111 1010 0111 0101 0110 1111 1011 0100 1110 1010 101(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 529 6(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 0111 1010 0111 0101 0110 1111 1011 0100 1110 1010 101(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 529 6(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 0111 1010 0111 0101 0110 1111 1011 0100 1110 1010 101(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 0111 1010 0111 0101 0110 1111 1011 0100 1110 1010 101(2) × 20 =


1.0100 0010 0011 1101 0011 1010 1011 0111 1101 1010 0111 0101 0101(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 0011 1101 0011 1010 1011 0111 1101 1010 0111 0101 0101


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 0011 1101 0011 1010 1011 0111 1101 1010 0111 0101 0101 =


0100 0010 0011 1101 0011 1010 1011 0111 1101 1010 0111 0101 0101


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 0011 1101 0011 1010 1011 0111 1101 1010 0111 0101 0101


Decimal number 0.000 000 000 000 000 000 008 529 6 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 0011 1101 0011 1010 1011 0111 1101 1010 0111 0101 0101


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100