0.000 000 000 000 000 000 008 542 9 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 542 9(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 542 9(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 542 9.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 542 9 × 2 = 0 + 0.000 000 000 000 000 000 017 085 8;
  • 2) 0.000 000 000 000 000 000 017 085 8 × 2 = 0 + 0.000 000 000 000 000 000 034 171 6;
  • 3) 0.000 000 000 000 000 000 034 171 6 × 2 = 0 + 0.000 000 000 000 000 000 068 343 2;
  • 4) 0.000 000 000 000 000 000 068 343 2 × 2 = 0 + 0.000 000 000 000 000 000 136 686 4;
  • 5) 0.000 000 000 000 000 000 136 686 4 × 2 = 0 + 0.000 000 000 000 000 000 273 372 8;
  • 6) 0.000 000 000 000 000 000 273 372 8 × 2 = 0 + 0.000 000 000 000 000 000 546 745 6;
  • 7) 0.000 000 000 000 000 000 546 745 6 × 2 = 0 + 0.000 000 000 000 000 001 093 491 2;
  • 8) 0.000 000 000 000 000 001 093 491 2 × 2 = 0 + 0.000 000 000 000 000 002 186 982 4;
  • 9) 0.000 000 000 000 000 002 186 982 4 × 2 = 0 + 0.000 000 000 000 000 004 373 964 8;
  • 10) 0.000 000 000 000 000 004 373 964 8 × 2 = 0 + 0.000 000 000 000 000 008 747 929 6;
  • 11) 0.000 000 000 000 000 008 747 929 6 × 2 = 0 + 0.000 000 000 000 000 017 495 859 2;
  • 12) 0.000 000 000 000 000 017 495 859 2 × 2 = 0 + 0.000 000 000 000 000 034 991 718 4;
  • 13) 0.000 000 000 000 000 034 991 718 4 × 2 = 0 + 0.000 000 000 000 000 069 983 436 8;
  • 14) 0.000 000 000 000 000 069 983 436 8 × 2 = 0 + 0.000 000 000 000 000 139 966 873 6;
  • 15) 0.000 000 000 000 000 139 966 873 6 × 2 = 0 + 0.000 000 000 000 000 279 933 747 2;
  • 16) 0.000 000 000 000 000 279 933 747 2 × 2 = 0 + 0.000 000 000 000 000 559 867 494 4;
  • 17) 0.000 000 000 000 000 559 867 494 4 × 2 = 0 + 0.000 000 000 000 001 119 734 988 8;
  • 18) 0.000 000 000 000 001 119 734 988 8 × 2 = 0 + 0.000 000 000 000 002 239 469 977 6;
  • 19) 0.000 000 000 000 002 239 469 977 6 × 2 = 0 + 0.000 000 000 000 004 478 939 955 2;
  • 20) 0.000 000 000 000 004 478 939 955 2 × 2 = 0 + 0.000 000 000 000 008 957 879 910 4;
  • 21) 0.000 000 000 000 008 957 879 910 4 × 2 = 0 + 0.000 000 000 000 017 915 759 820 8;
  • 22) 0.000 000 000 000 017 915 759 820 8 × 2 = 0 + 0.000 000 000 000 035 831 519 641 6;
  • 23) 0.000 000 000 000 035 831 519 641 6 × 2 = 0 + 0.000 000 000 000 071 663 039 283 2;
  • 24) 0.000 000 000 000 071 663 039 283 2 × 2 = 0 + 0.000 000 000 000 143 326 078 566 4;
  • 25) 0.000 000 000 000 143 326 078 566 4 × 2 = 0 + 0.000 000 000 000 286 652 157 132 8;
  • 26) 0.000 000 000 000 286 652 157 132 8 × 2 = 0 + 0.000 000 000 000 573 304 314 265 6;
  • 27) 0.000 000 000 000 573 304 314 265 6 × 2 = 0 + 0.000 000 000 001 146 608 628 531 2;
  • 28) 0.000 000 000 001 146 608 628 531 2 × 2 = 0 + 0.000 000 000 002 293 217 257 062 4;
  • 29) 0.000 000 000 002 293 217 257 062 4 × 2 = 0 + 0.000 000 000 004 586 434 514 124 8;
  • 30) 0.000 000 000 004 586 434 514 124 8 × 2 = 0 + 0.000 000 000 009 172 869 028 249 6;
  • 31) 0.000 000 000 009 172 869 028 249 6 × 2 = 0 + 0.000 000 000 018 345 738 056 499 2;
  • 32) 0.000 000 000 018 345 738 056 499 2 × 2 = 0 + 0.000 000 000 036 691 476 112 998 4;
  • 33) 0.000 000 000 036 691 476 112 998 4 × 2 = 0 + 0.000 000 000 073 382 952 225 996 8;
  • 34) 0.000 000 000 073 382 952 225 996 8 × 2 = 0 + 0.000 000 000 146 765 904 451 993 6;
  • 35) 0.000 000 000 146 765 904 451 993 6 × 2 = 0 + 0.000 000 000 293 531 808 903 987 2;
  • 36) 0.000 000 000 293 531 808 903 987 2 × 2 = 0 + 0.000 000 000 587 063 617 807 974 4;
  • 37) 0.000 000 000 587 063 617 807 974 4 × 2 = 0 + 0.000 000 001 174 127 235 615 948 8;
  • 38) 0.000 000 001 174 127 235 615 948 8 × 2 = 0 + 0.000 000 002 348 254 471 231 897 6;
  • 39) 0.000 000 002 348 254 471 231 897 6 × 2 = 0 + 0.000 000 004 696 508 942 463 795 2;
  • 40) 0.000 000 004 696 508 942 463 795 2 × 2 = 0 + 0.000 000 009 393 017 884 927 590 4;
  • 41) 0.000 000 009 393 017 884 927 590 4 × 2 = 0 + 0.000 000 018 786 035 769 855 180 8;
  • 42) 0.000 000 018 786 035 769 855 180 8 × 2 = 0 + 0.000 000 037 572 071 539 710 361 6;
  • 43) 0.000 000 037 572 071 539 710 361 6 × 2 = 0 + 0.000 000 075 144 143 079 420 723 2;
  • 44) 0.000 000 075 144 143 079 420 723 2 × 2 = 0 + 0.000 000 150 288 286 158 841 446 4;
  • 45) 0.000 000 150 288 286 158 841 446 4 × 2 = 0 + 0.000 000 300 576 572 317 682 892 8;
  • 46) 0.000 000 300 576 572 317 682 892 8 × 2 = 0 + 0.000 000 601 153 144 635 365 785 6;
  • 47) 0.000 000 601 153 144 635 365 785 6 × 2 = 0 + 0.000 001 202 306 289 270 731 571 2;
  • 48) 0.000 001 202 306 289 270 731 571 2 × 2 = 0 + 0.000 002 404 612 578 541 463 142 4;
  • 49) 0.000 002 404 612 578 541 463 142 4 × 2 = 0 + 0.000 004 809 225 157 082 926 284 8;
  • 50) 0.000 004 809 225 157 082 926 284 8 × 2 = 0 + 0.000 009 618 450 314 165 852 569 6;
  • 51) 0.000 009 618 450 314 165 852 569 6 × 2 = 0 + 0.000 019 236 900 628 331 705 139 2;
  • 52) 0.000 019 236 900 628 331 705 139 2 × 2 = 0 + 0.000 038 473 801 256 663 410 278 4;
  • 53) 0.000 038 473 801 256 663 410 278 4 × 2 = 0 + 0.000 076 947 602 513 326 820 556 8;
  • 54) 0.000 076 947 602 513 326 820 556 8 × 2 = 0 + 0.000 153 895 205 026 653 641 113 6;
  • 55) 0.000 153 895 205 026 653 641 113 6 × 2 = 0 + 0.000 307 790 410 053 307 282 227 2;
  • 56) 0.000 307 790 410 053 307 282 227 2 × 2 = 0 + 0.000 615 580 820 106 614 564 454 4;
  • 57) 0.000 615 580 820 106 614 564 454 4 × 2 = 0 + 0.001 231 161 640 213 229 128 908 8;
  • 58) 0.001 231 161 640 213 229 128 908 8 × 2 = 0 + 0.002 462 323 280 426 458 257 817 6;
  • 59) 0.002 462 323 280 426 458 257 817 6 × 2 = 0 + 0.004 924 646 560 852 916 515 635 2;
  • 60) 0.004 924 646 560 852 916 515 635 2 × 2 = 0 + 0.009 849 293 121 705 833 031 270 4;
  • 61) 0.009 849 293 121 705 833 031 270 4 × 2 = 0 + 0.019 698 586 243 411 666 062 540 8;
  • 62) 0.019 698 586 243 411 666 062 540 8 × 2 = 0 + 0.039 397 172 486 823 332 125 081 6;
  • 63) 0.039 397 172 486 823 332 125 081 6 × 2 = 0 + 0.078 794 344 973 646 664 250 163 2;
  • 64) 0.078 794 344 973 646 664 250 163 2 × 2 = 0 + 0.157 588 689 947 293 328 500 326 4;
  • 65) 0.157 588 689 947 293 328 500 326 4 × 2 = 0 + 0.315 177 379 894 586 657 000 652 8;
  • 66) 0.315 177 379 894 586 657 000 652 8 × 2 = 0 + 0.630 354 759 789 173 314 001 305 6;
  • 67) 0.630 354 759 789 173 314 001 305 6 × 2 = 1 + 0.260 709 519 578 346 628 002 611 2;
  • 68) 0.260 709 519 578 346 628 002 611 2 × 2 = 0 + 0.521 419 039 156 693 256 005 222 4;
  • 69) 0.521 419 039 156 693 256 005 222 4 × 2 = 1 + 0.042 838 078 313 386 512 010 444 8;
  • 70) 0.042 838 078 313 386 512 010 444 8 × 2 = 0 + 0.085 676 156 626 773 024 020 889 6;
  • 71) 0.085 676 156 626 773 024 020 889 6 × 2 = 0 + 0.171 352 313 253 546 048 041 779 2;
  • 72) 0.171 352 313 253 546 048 041 779 2 × 2 = 0 + 0.342 704 626 507 092 096 083 558 4;
  • 73) 0.342 704 626 507 092 096 083 558 4 × 2 = 0 + 0.685 409 253 014 184 192 167 116 8;
  • 74) 0.685 409 253 014 184 192 167 116 8 × 2 = 1 + 0.370 818 506 028 368 384 334 233 6;
  • 75) 0.370 818 506 028 368 384 334 233 6 × 2 = 0 + 0.741 637 012 056 736 768 668 467 2;
  • 76) 0.741 637 012 056 736 768 668 467 2 × 2 = 1 + 0.483 274 024 113 473 537 336 934 4;
  • 77) 0.483 274 024 113 473 537 336 934 4 × 2 = 0 + 0.966 548 048 226 947 074 673 868 8;
  • 78) 0.966 548 048 226 947 074 673 868 8 × 2 = 1 + 0.933 096 096 453 894 149 347 737 6;
  • 79) 0.933 096 096 453 894 149 347 737 6 × 2 = 1 + 0.866 192 192 907 788 298 695 475 2;
  • 80) 0.866 192 192 907 788 298 695 475 2 × 2 = 1 + 0.732 384 385 815 576 597 390 950 4;
  • 81) 0.732 384 385 815 576 597 390 950 4 × 2 = 1 + 0.464 768 771 631 153 194 781 900 8;
  • 82) 0.464 768 771 631 153 194 781 900 8 × 2 = 0 + 0.929 537 543 262 306 389 563 801 6;
  • 83) 0.929 537 543 262 306 389 563 801 6 × 2 = 1 + 0.859 075 086 524 612 779 127 603 2;
  • 84) 0.859 075 086 524 612 779 127 603 2 × 2 = 1 + 0.718 150 173 049 225 558 255 206 4;
  • 85) 0.718 150 173 049 225 558 255 206 4 × 2 = 1 + 0.436 300 346 098 451 116 510 412 8;
  • 86) 0.436 300 346 098 451 116 510 412 8 × 2 = 0 + 0.872 600 692 196 902 233 020 825 6;
  • 87) 0.872 600 692 196 902 233 020 825 6 × 2 = 1 + 0.745 201 384 393 804 466 041 651 2;
  • 88) 0.745 201 384 393 804 466 041 651 2 × 2 = 1 + 0.490 402 768 787 608 932 083 302 4;
  • 89) 0.490 402 768 787 608 932 083 302 4 × 2 = 0 + 0.980 805 537 575 217 864 166 604 8;
  • 90) 0.980 805 537 575 217 864 166 604 8 × 2 = 1 + 0.961 611 075 150 435 728 333 209 6;
  • 91) 0.961 611 075 150 435 728 333 209 6 × 2 = 1 + 0.923 222 150 300 871 456 666 419 2;
  • 92) 0.923 222 150 300 871 456 666 419 2 × 2 = 1 + 0.846 444 300 601 742 913 332 838 4;
  • 93) 0.846 444 300 601 742 913 332 838 4 × 2 = 1 + 0.692 888 601 203 485 826 665 676 8;
  • 94) 0.692 888 601 203 485 826 665 676 8 × 2 = 1 + 0.385 777 202 406 971 653 331 353 6;
  • 95) 0.385 777 202 406 971 653 331 353 6 × 2 = 0 + 0.771 554 404 813 943 306 662 707 2;
  • 96) 0.771 554 404 813 943 306 662 707 2 × 2 = 1 + 0.543 108 809 627 886 613 325 414 4;
  • 97) 0.543 108 809 627 886 613 325 414 4 × 2 = 1 + 0.086 217 619 255 773 226 650 828 8;
  • 98) 0.086 217 619 255 773 226 650 828 8 × 2 = 0 + 0.172 435 238 511 546 453 301 657 6;
  • 99) 0.172 435 238 511 546 453 301 657 6 × 2 = 0 + 0.344 870 477 023 092 906 603 315 2;
  • 100) 0.344 870 477 023 092 906 603 315 2 × 2 = 0 + 0.689 740 954 046 185 813 206 630 4;
  • 101) 0.689 740 954 046 185 813 206 630 4 × 2 = 1 + 0.379 481 908 092 371 626 413 260 8;
  • 102) 0.379 481 908 092 371 626 413 260 8 × 2 = 0 + 0.758 963 816 184 743 252 826 521 6;
  • 103) 0.758 963 816 184 743 252 826 521 6 × 2 = 1 + 0.517 927 632 369 486 505 653 043 2;
  • 104) 0.517 927 632 369 486 505 653 043 2 × 2 = 1 + 0.035 855 264 738 973 011 306 086 4;
  • 105) 0.035 855 264 738 973 011 306 086 4 × 2 = 0 + 0.071 710 529 477 946 022 612 172 8;
  • 106) 0.071 710 529 477 946 022 612 172 8 × 2 = 0 + 0.143 421 058 955 892 045 224 345 6;
  • 107) 0.143 421 058 955 892 045 224 345 6 × 2 = 0 + 0.286 842 117 911 784 090 448 691 2;
  • 108) 0.286 842 117 911 784 090 448 691 2 × 2 = 0 + 0.573 684 235 823 568 180 897 382 4;
  • 109) 0.573 684 235 823 568 180 897 382 4 × 2 = 1 + 0.147 368 471 647 136 361 794 764 8;
  • 110) 0.147 368 471 647 136 361 794 764 8 × 2 = 0 + 0.294 736 943 294 272 723 589 529 6;
  • 111) 0.294 736 943 294 272 723 589 529 6 × 2 = 0 + 0.589 473 886 588 545 447 179 059 2;
  • 112) 0.589 473 886 588 545 447 179 059 2 × 2 = 1 + 0.178 947 773 177 090 894 358 118 4;
  • 113) 0.178 947 773 177 090 894 358 118 4 × 2 = 0 + 0.357 895 546 354 181 788 716 236 8;
  • 114) 0.357 895 546 354 181 788 716 236 8 × 2 = 0 + 0.715 791 092 708 363 577 432 473 6;
  • 115) 0.715 791 092 708 363 577 432 473 6 × 2 = 1 + 0.431 582 185 416 727 154 864 947 2;
  • 116) 0.431 582 185 416 727 154 864 947 2 × 2 = 0 + 0.863 164 370 833 454 309 729 894 4;
  • 117) 0.863 164 370 833 454 309 729 894 4 × 2 = 1 + 0.726 328 741 666 908 619 459 788 8;
  • 118) 0.726 328 741 666 908 619 459 788 8 × 2 = 1 + 0.452 657 483 333 817 238 919 577 6;
  • 119) 0.452 657 483 333 817 238 919 577 6 × 2 = 0 + 0.905 314 966 667 634 477 839 155 2;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 542 9(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0111 1011 1011 0111 1101 1000 1011 0000 1001 0010 110(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 542 9(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0111 1011 1011 0111 1101 1000 1011 0000 1001 0010 110(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 542 9(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0111 1011 1011 0111 1101 1000 1011 0000 1001 0010 110(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0111 1011 1011 0111 1101 1000 1011 0000 1001 0010 110(2) × 20 =


1.0100 0010 1011 1101 1101 1011 1110 1100 0101 1000 0100 1001 0110(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 1011 1101 1101 1011 1110 1100 0101 1000 0100 1001 0110


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 1011 1101 1101 1011 1110 1100 0101 1000 0100 1001 0110 =


0100 0010 1011 1101 1101 1011 1110 1100 0101 1000 0100 1001 0110


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 1011 1101 1101 1011 1110 1100 0101 1000 0100 1001 0110


Decimal number 0.000 000 000 000 000 000 008 542 9 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 1011 1101 1101 1011 1110 1100 0101 1000 0100 1001 0110


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100