0.000 000 000 000 000 000 008 532 31 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 532 31(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 532 31(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 532 31.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 532 31 × 2 = 0 + 0.000 000 000 000 000 000 017 064 62;
  • 2) 0.000 000 000 000 000 000 017 064 62 × 2 = 0 + 0.000 000 000 000 000 000 034 129 24;
  • 3) 0.000 000 000 000 000 000 034 129 24 × 2 = 0 + 0.000 000 000 000 000 000 068 258 48;
  • 4) 0.000 000 000 000 000 000 068 258 48 × 2 = 0 + 0.000 000 000 000 000 000 136 516 96;
  • 5) 0.000 000 000 000 000 000 136 516 96 × 2 = 0 + 0.000 000 000 000 000 000 273 033 92;
  • 6) 0.000 000 000 000 000 000 273 033 92 × 2 = 0 + 0.000 000 000 000 000 000 546 067 84;
  • 7) 0.000 000 000 000 000 000 546 067 84 × 2 = 0 + 0.000 000 000 000 000 001 092 135 68;
  • 8) 0.000 000 000 000 000 001 092 135 68 × 2 = 0 + 0.000 000 000 000 000 002 184 271 36;
  • 9) 0.000 000 000 000 000 002 184 271 36 × 2 = 0 + 0.000 000 000 000 000 004 368 542 72;
  • 10) 0.000 000 000 000 000 004 368 542 72 × 2 = 0 + 0.000 000 000 000 000 008 737 085 44;
  • 11) 0.000 000 000 000 000 008 737 085 44 × 2 = 0 + 0.000 000 000 000 000 017 474 170 88;
  • 12) 0.000 000 000 000 000 017 474 170 88 × 2 = 0 + 0.000 000 000 000 000 034 948 341 76;
  • 13) 0.000 000 000 000 000 034 948 341 76 × 2 = 0 + 0.000 000 000 000 000 069 896 683 52;
  • 14) 0.000 000 000 000 000 069 896 683 52 × 2 = 0 + 0.000 000 000 000 000 139 793 367 04;
  • 15) 0.000 000 000 000 000 139 793 367 04 × 2 = 0 + 0.000 000 000 000 000 279 586 734 08;
  • 16) 0.000 000 000 000 000 279 586 734 08 × 2 = 0 + 0.000 000 000 000 000 559 173 468 16;
  • 17) 0.000 000 000 000 000 559 173 468 16 × 2 = 0 + 0.000 000 000 000 001 118 346 936 32;
  • 18) 0.000 000 000 000 001 118 346 936 32 × 2 = 0 + 0.000 000 000 000 002 236 693 872 64;
  • 19) 0.000 000 000 000 002 236 693 872 64 × 2 = 0 + 0.000 000 000 000 004 473 387 745 28;
  • 20) 0.000 000 000 000 004 473 387 745 28 × 2 = 0 + 0.000 000 000 000 008 946 775 490 56;
  • 21) 0.000 000 000 000 008 946 775 490 56 × 2 = 0 + 0.000 000 000 000 017 893 550 981 12;
  • 22) 0.000 000 000 000 017 893 550 981 12 × 2 = 0 + 0.000 000 000 000 035 787 101 962 24;
  • 23) 0.000 000 000 000 035 787 101 962 24 × 2 = 0 + 0.000 000 000 000 071 574 203 924 48;
  • 24) 0.000 000 000 000 071 574 203 924 48 × 2 = 0 + 0.000 000 000 000 143 148 407 848 96;
  • 25) 0.000 000 000 000 143 148 407 848 96 × 2 = 0 + 0.000 000 000 000 286 296 815 697 92;
  • 26) 0.000 000 000 000 286 296 815 697 92 × 2 = 0 + 0.000 000 000 000 572 593 631 395 84;
  • 27) 0.000 000 000 000 572 593 631 395 84 × 2 = 0 + 0.000 000 000 001 145 187 262 791 68;
  • 28) 0.000 000 000 001 145 187 262 791 68 × 2 = 0 + 0.000 000 000 002 290 374 525 583 36;
  • 29) 0.000 000 000 002 290 374 525 583 36 × 2 = 0 + 0.000 000 000 004 580 749 051 166 72;
  • 30) 0.000 000 000 004 580 749 051 166 72 × 2 = 0 + 0.000 000 000 009 161 498 102 333 44;
  • 31) 0.000 000 000 009 161 498 102 333 44 × 2 = 0 + 0.000 000 000 018 322 996 204 666 88;
  • 32) 0.000 000 000 018 322 996 204 666 88 × 2 = 0 + 0.000 000 000 036 645 992 409 333 76;
  • 33) 0.000 000 000 036 645 992 409 333 76 × 2 = 0 + 0.000 000 000 073 291 984 818 667 52;
  • 34) 0.000 000 000 073 291 984 818 667 52 × 2 = 0 + 0.000 000 000 146 583 969 637 335 04;
  • 35) 0.000 000 000 146 583 969 637 335 04 × 2 = 0 + 0.000 000 000 293 167 939 274 670 08;
  • 36) 0.000 000 000 293 167 939 274 670 08 × 2 = 0 + 0.000 000 000 586 335 878 549 340 16;
  • 37) 0.000 000 000 586 335 878 549 340 16 × 2 = 0 + 0.000 000 001 172 671 757 098 680 32;
  • 38) 0.000 000 001 172 671 757 098 680 32 × 2 = 0 + 0.000 000 002 345 343 514 197 360 64;
  • 39) 0.000 000 002 345 343 514 197 360 64 × 2 = 0 + 0.000 000 004 690 687 028 394 721 28;
  • 40) 0.000 000 004 690 687 028 394 721 28 × 2 = 0 + 0.000 000 009 381 374 056 789 442 56;
  • 41) 0.000 000 009 381 374 056 789 442 56 × 2 = 0 + 0.000 000 018 762 748 113 578 885 12;
  • 42) 0.000 000 018 762 748 113 578 885 12 × 2 = 0 + 0.000 000 037 525 496 227 157 770 24;
  • 43) 0.000 000 037 525 496 227 157 770 24 × 2 = 0 + 0.000 000 075 050 992 454 315 540 48;
  • 44) 0.000 000 075 050 992 454 315 540 48 × 2 = 0 + 0.000 000 150 101 984 908 631 080 96;
  • 45) 0.000 000 150 101 984 908 631 080 96 × 2 = 0 + 0.000 000 300 203 969 817 262 161 92;
  • 46) 0.000 000 300 203 969 817 262 161 92 × 2 = 0 + 0.000 000 600 407 939 634 524 323 84;
  • 47) 0.000 000 600 407 939 634 524 323 84 × 2 = 0 + 0.000 001 200 815 879 269 048 647 68;
  • 48) 0.000 001 200 815 879 269 048 647 68 × 2 = 0 + 0.000 002 401 631 758 538 097 295 36;
  • 49) 0.000 002 401 631 758 538 097 295 36 × 2 = 0 + 0.000 004 803 263 517 076 194 590 72;
  • 50) 0.000 004 803 263 517 076 194 590 72 × 2 = 0 + 0.000 009 606 527 034 152 389 181 44;
  • 51) 0.000 009 606 527 034 152 389 181 44 × 2 = 0 + 0.000 019 213 054 068 304 778 362 88;
  • 52) 0.000 019 213 054 068 304 778 362 88 × 2 = 0 + 0.000 038 426 108 136 609 556 725 76;
  • 53) 0.000 038 426 108 136 609 556 725 76 × 2 = 0 + 0.000 076 852 216 273 219 113 451 52;
  • 54) 0.000 076 852 216 273 219 113 451 52 × 2 = 0 + 0.000 153 704 432 546 438 226 903 04;
  • 55) 0.000 153 704 432 546 438 226 903 04 × 2 = 0 + 0.000 307 408 865 092 876 453 806 08;
  • 56) 0.000 307 408 865 092 876 453 806 08 × 2 = 0 + 0.000 614 817 730 185 752 907 612 16;
  • 57) 0.000 614 817 730 185 752 907 612 16 × 2 = 0 + 0.001 229 635 460 371 505 815 224 32;
  • 58) 0.001 229 635 460 371 505 815 224 32 × 2 = 0 + 0.002 459 270 920 743 011 630 448 64;
  • 59) 0.002 459 270 920 743 011 630 448 64 × 2 = 0 + 0.004 918 541 841 486 023 260 897 28;
  • 60) 0.004 918 541 841 486 023 260 897 28 × 2 = 0 + 0.009 837 083 682 972 046 521 794 56;
  • 61) 0.009 837 083 682 972 046 521 794 56 × 2 = 0 + 0.019 674 167 365 944 093 043 589 12;
  • 62) 0.019 674 167 365 944 093 043 589 12 × 2 = 0 + 0.039 348 334 731 888 186 087 178 24;
  • 63) 0.039 348 334 731 888 186 087 178 24 × 2 = 0 + 0.078 696 669 463 776 372 174 356 48;
  • 64) 0.078 696 669 463 776 372 174 356 48 × 2 = 0 + 0.157 393 338 927 552 744 348 712 96;
  • 65) 0.157 393 338 927 552 744 348 712 96 × 2 = 0 + 0.314 786 677 855 105 488 697 425 92;
  • 66) 0.314 786 677 855 105 488 697 425 92 × 2 = 0 + 0.629 573 355 710 210 977 394 851 84;
  • 67) 0.629 573 355 710 210 977 394 851 84 × 2 = 1 + 0.259 146 711 420 421 954 789 703 68;
  • 68) 0.259 146 711 420 421 954 789 703 68 × 2 = 0 + 0.518 293 422 840 843 909 579 407 36;
  • 69) 0.518 293 422 840 843 909 579 407 36 × 2 = 1 + 0.036 586 845 681 687 819 158 814 72;
  • 70) 0.036 586 845 681 687 819 158 814 72 × 2 = 0 + 0.073 173 691 363 375 638 317 629 44;
  • 71) 0.073 173 691 363 375 638 317 629 44 × 2 = 0 + 0.146 347 382 726 751 276 635 258 88;
  • 72) 0.146 347 382 726 751 276 635 258 88 × 2 = 0 + 0.292 694 765 453 502 553 270 517 76;
  • 73) 0.292 694 765 453 502 553 270 517 76 × 2 = 0 + 0.585 389 530 907 005 106 541 035 52;
  • 74) 0.585 389 530 907 005 106 541 035 52 × 2 = 1 + 0.170 779 061 814 010 213 082 071 04;
  • 75) 0.170 779 061 814 010 213 082 071 04 × 2 = 0 + 0.341 558 123 628 020 426 164 142 08;
  • 76) 0.341 558 123 628 020 426 164 142 08 × 2 = 0 + 0.683 116 247 256 040 852 328 284 16;
  • 77) 0.683 116 247 256 040 852 328 284 16 × 2 = 1 + 0.366 232 494 512 081 704 656 568 32;
  • 78) 0.366 232 494 512 081 704 656 568 32 × 2 = 0 + 0.732 464 989 024 163 409 313 136 64;
  • 79) 0.732 464 989 024 163 409 313 136 64 × 2 = 1 + 0.464 929 978 048 326 818 626 273 28;
  • 80) 0.464 929 978 048 326 818 626 273 28 × 2 = 0 + 0.929 859 956 096 653 637 252 546 56;
  • 81) 0.929 859 956 096 653 637 252 546 56 × 2 = 1 + 0.859 719 912 193 307 274 505 093 12;
  • 82) 0.859 719 912 193 307 274 505 093 12 × 2 = 1 + 0.719 439 824 386 614 549 010 186 24;
  • 83) 0.719 439 824 386 614 549 010 186 24 × 2 = 1 + 0.438 879 648 773 229 098 020 372 48;
  • 84) 0.438 879 648 773 229 098 020 372 48 × 2 = 0 + 0.877 759 297 546 458 196 040 744 96;
  • 85) 0.877 759 297 546 458 196 040 744 96 × 2 = 1 + 0.755 518 595 092 916 392 081 489 92;
  • 86) 0.755 518 595 092 916 392 081 489 92 × 2 = 1 + 0.511 037 190 185 832 784 162 979 84;
  • 87) 0.511 037 190 185 832 784 162 979 84 × 2 = 1 + 0.022 074 380 371 665 568 325 959 68;
  • 88) 0.022 074 380 371 665 568 325 959 68 × 2 = 0 + 0.044 148 760 743 331 136 651 919 36;
  • 89) 0.044 148 760 743 331 136 651 919 36 × 2 = 0 + 0.088 297 521 486 662 273 303 838 72;
  • 90) 0.088 297 521 486 662 273 303 838 72 × 2 = 0 + 0.176 595 042 973 324 546 607 677 44;
  • 91) 0.176 595 042 973 324 546 607 677 44 × 2 = 0 + 0.353 190 085 946 649 093 215 354 88;
  • 92) 0.353 190 085 946 649 093 215 354 88 × 2 = 0 + 0.706 380 171 893 298 186 430 709 76;
  • 93) 0.706 380 171 893 298 186 430 709 76 × 2 = 1 + 0.412 760 343 786 596 372 861 419 52;
  • 94) 0.412 760 343 786 596 372 861 419 52 × 2 = 0 + 0.825 520 687 573 192 745 722 839 04;
  • 95) 0.825 520 687 573 192 745 722 839 04 × 2 = 1 + 0.651 041 375 146 385 491 445 678 08;
  • 96) 0.651 041 375 146 385 491 445 678 08 × 2 = 1 + 0.302 082 750 292 770 982 891 356 16;
  • 97) 0.302 082 750 292 770 982 891 356 16 × 2 = 0 + 0.604 165 500 585 541 965 782 712 32;
  • 98) 0.604 165 500 585 541 965 782 712 32 × 2 = 1 + 0.208 331 001 171 083 931 565 424 64;
  • 99) 0.208 331 001 171 083 931 565 424 64 × 2 = 0 + 0.416 662 002 342 167 863 130 849 28;
  • 100) 0.416 662 002 342 167 863 130 849 28 × 2 = 0 + 0.833 324 004 684 335 726 261 698 56;
  • 101) 0.833 324 004 684 335 726 261 698 56 × 2 = 1 + 0.666 648 009 368 671 452 523 397 12;
  • 102) 0.666 648 009 368 671 452 523 397 12 × 2 = 1 + 0.333 296 018 737 342 905 046 794 24;
  • 103) 0.333 296 018 737 342 905 046 794 24 × 2 = 0 + 0.666 592 037 474 685 810 093 588 48;
  • 104) 0.666 592 037 474 685 810 093 588 48 × 2 = 1 + 0.333 184 074 949 371 620 187 176 96;
  • 105) 0.333 184 074 949 371 620 187 176 96 × 2 = 0 + 0.666 368 149 898 743 240 374 353 92;
  • 106) 0.666 368 149 898 743 240 374 353 92 × 2 = 1 + 0.332 736 299 797 486 480 748 707 84;
  • 107) 0.332 736 299 797 486 480 748 707 84 × 2 = 0 + 0.665 472 599 594 972 961 497 415 68;
  • 108) 0.665 472 599 594 972 961 497 415 68 × 2 = 1 + 0.330 945 199 189 945 922 994 831 36;
  • 109) 0.330 945 199 189 945 922 994 831 36 × 2 = 0 + 0.661 890 398 379 891 845 989 662 72;
  • 110) 0.661 890 398 379 891 845 989 662 72 × 2 = 1 + 0.323 780 796 759 783 691 979 325 44;
  • 111) 0.323 780 796 759 783 691 979 325 44 × 2 = 0 + 0.647 561 593 519 567 383 958 650 88;
  • 112) 0.647 561 593 519 567 383 958 650 88 × 2 = 1 + 0.295 123 187 039 134 767 917 301 76;
  • 113) 0.295 123 187 039 134 767 917 301 76 × 2 = 0 + 0.590 246 374 078 269 535 834 603 52;
  • 114) 0.590 246 374 078 269 535 834 603 52 × 2 = 1 + 0.180 492 748 156 539 071 669 207 04;
  • 115) 0.180 492 748 156 539 071 669 207 04 × 2 = 0 + 0.360 985 496 313 078 143 338 414 08;
  • 116) 0.360 985 496 313 078 143 338 414 08 × 2 = 0 + 0.721 970 992 626 156 286 676 828 16;
  • 117) 0.721 970 992 626 156 286 676 828 16 × 2 = 1 + 0.443 941 985 252 312 573 353 656 32;
  • 118) 0.443 941 985 252 312 573 353 656 32 × 2 = 0 + 0.887 883 970 504 625 146 707 312 64;
  • 119) 0.887 883 970 504 625 146 707 312 64 × 2 = 1 + 0.775 767 941 009 250 293 414 625 28;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 532 31(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1010 1110 1110 0000 1011 0100 1101 0101 0101 0100 101(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 532 31(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1010 1110 1110 0000 1011 0100 1101 0101 0101 0100 101(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 532 31(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1010 1110 1110 0000 1011 0100 1101 0101 0101 0100 101(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1010 1110 1110 0000 1011 0100 1101 0101 0101 0100 101(2) × 20 =


1.0100 0010 0101 0111 0111 0000 0101 1010 0110 1010 1010 1010 0101(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 0101 0111 0111 0000 0101 1010 0110 1010 1010 1010 0101


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 0101 0111 0111 0000 0101 1010 0110 1010 1010 1010 0101 =


0100 0010 0101 0111 0111 0000 0101 1010 0110 1010 1010 1010 0101


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 0101 0111 0111 0000 0101 1010 0110 1010 1010 1010 0101


Decimal number 0.000 000 000 000 000 000 008 532 31 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 0101 0111 0111 0000 0101 1010 0110 1010 1010 1010 0101


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100