0.000 000 000 000 000 000 008 536 96 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 536 96(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 536 96(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 536 96.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 536 96 × 2 = 0 + 0.000 000 000 000 000 000 017 073 92;
  • 2) 0.000 000 000 000 000 000 017 073 92 × 2 = 0 + 0.000 000 000 000 000 000 034 147 84;
  • 3) 0.000 000 000 000 000 000 034 147 84 × 2 = 0 + 0.000 000 000 000 000 000 068 295 68;
  • 4) 0.000 000 000 000 000 000 068 295 68 × 2 = 0 + 0.000 000 000 000 000 000 136 591 36;
  • 5) 0.000 000 000 000 000 000 136 591 36 × 2 = 0 + 0.000 000 000 000 000 000 273 182 72;
  • 6) 0.000 000 000 000 000 000 273 182 72 × 2 = 0 + 0.000 000 000 000 000 000 546 365 44;
  • 7) 0.000 000 000 000 000 000 546 365 44 × 2 = 0 + 0.000 000 000 000 000 001 092 730 88;
  • 8) 0.000 000 000 000 000 001 092 730 88 × 2 = 0 + 0.000 000 000 000 000 002 185 461 76;
  • 9) 0.000 000 000 000 000 002 185 461 76 × 2 = 0 + 0.000 000 000 000 000 004 370 923 52;
  • 10) 0.000 000 000 000 000 004 370 923 52 × 2 = 0 + 0.000 000 000 000 000 008 741 847 04;
  • 11) 0.000 000 000 000 000 008 741 847 04 × 2 = 0 + 0.000 000 000 000 000 017 483 694 08;
  • 12) 0.000 000 000 000 000 017 483 694 08 × 2 = 0 + 0.000 000 000 000 000 034 967 388 16;
  • 13) 0.000 000 000 000 000 034 967 388 16 × 2 = 0 + 0.000 000 000 000 000 069 934 776 32;
  • 14) 0.000 000 000 000 000 069 934 776 32 × 2 = 0 + 0.000 000 000 000 000 139 869 552 64;
  • 15) 0.000 000 000 000 000 139 869 552 64 × 2 = 0 + 0.000 000 000 000 000 279 739 105 28;
  • 16) 0.000 000 000 000 000 279 739 105 28 × 2 = 0 + 0.000 000 000 000 000 559 478 210 56;
  • 17) 0.000 000 000 000 000 559 478 210 56 × 2 = 0 + 0.000 000 000 000 001 118 956 421 12;
  • 18) 0.000 000 000 000 001 118 956 421 12 × 2 = 0 + 0.000 000 000 000 002 237 912 842 24;
  • 19) 0.000 000 000 000 002 237 912 842 24 × 2 = 0 + 0.000 000 000 000 004 475 825 684 48;
  • 20) 0.000 000 000 000 004 475 825 684 48 × 2 = 0 + 0.000 000 000 000 008 951 651 368 96;
  • 21) 0.000 000 000 000 008 951 651 368 96 × 2 = 0 + 0.000 000 000 000 017 903 302 737 92;
  • 22) 0.000 000 000 000 017 903 302 737 92 × 2 = 0 + 0.000 000 000 000 035 806 605 475 84;
  • 23) 0.000 000 000 000 035 806 605 475 84 × 2 = 0 + 0.000 000 000 000 071 613 210 951 68;
  • 24) 0.000 000 000 000 071 613 210 951 68 × 2 = 0 + 0.000 000 000 000 143 226 421 903 36;
  • 25) 0.000 000 000 000 143 226 421 903 36 × 2 = 0 + 0.000 000 000 000 286 452 843 806 72;
  • 26) 0.000 000 000 000 286 452 843 806 72 × 2 = 0 + 0.000 000 000 000 572 905 687 613 44;
  • 27) 0.000 000 000 000 572 905 687 613 44 × 2 = 0 + 0.000 000 000 001 145 811 375 226 88;
  • 28) 0.000 000 000 001 145 811 375 226 88 × 2 = 0 + 0.000 000 000 002 291 622 750 453 76;
  • 29) 0.000 000 000 002 291 622 750 453 76 × 2 = 0 + 0.000 000 000 004 583 245 500 907 52;
  • 30) 0.000 000 000 004 583 245 500 907 52 × 2 = 0 + 0.000 000 000 009 166 491 001 815 04;
  • 31) 0.000 000 000 009 166 491 001 815 04 × 2 = 0 + 0.000 000 000 018 332 982 003 630 08;
  • 32) 0.000 000 000 018 332 982 003 630 08 × 2 = 0 + 0.000 000 000 036 665 964 007 260 16;
  • 33) 0.000 000 000 036 665 964 007 260 16 × 2 = 0 + 0.000 000 000 073 331 928 014 520 32;
  • 34) 0.000 000 000 073 331 928 014 520 32 × 2 = 0 + 0.000 000 000 146 663 856 029 040 64;
  • 35) 0.000 000 000 146 663 856 029 040 64 × 2 = 0 + 0.000 000 000 293 327 712 058 081 28;
  • 36) 0.000 000 000 293 327 712 058 081 28 × 2 = 0 + 0.000 000 000 586 655 424 116 162 56;
  • 37) 0.000 000 000 586 655 424 116 162 56 × 2 = 0 + 0.000 000 001 173 310 848 232 325 12;
  • 38) 0.000 000 001 173 310 848 232 325 12 × 2 = 0 + 0.000 000 002 346 621 696 464 650 24;
  • 39) 0.000 000 002 346 621 696 464 650 24 × 2 = 0 + 0.000 000 004 693 243 392 929 300 48;
  • 40) 0.000 000 004 693 243 392 929 300 48 × 2 = 0 + 0.000 000 009 386 486 785 858 600 96;
  • 41) 0.000 000 009 386 486 785 858 600 96 × 2 = 0 + 0.000 000 018 772 973 571 717 201 92;
  • 42) 0.000 000 018 772 973 571 717 201 92 × 2 = 0 + 0.000 000 037 545 947 143 434 403 84;
  • 43) 0.000 000 037 545 947 143 434 403 84 × 2 = 0 + 0.000 000 075 091 894 286 868 807 68;
  • 44) 0.000 000 075 091 894 286 868 807 68 × 2 = 0 + 0.000 000 150 183 788 573 737 615 36;
  • 45) 0.000 000 150 183 788 573 737 615 36 × 2 = 0 + 0.000 000 300 367 577 147 475 230 72;
  • 46) 0.000 000 300 367 577 147 475 230 72 × 2 = 0 + 0.000 000 600 735 154 294 950 461 44;
  • 47) 0.000 000 600 735 154 294 950 461 44 × 2 = 0 + 0.000 001 201 470 308 589 900 922 88;
  • 48) 0.000 001 201 470 308 589 900 922 88 × 2 = 0 + 0.000 002 402 940 617 179 801 845 76;
  • 49) 0.000 002 402 940 617 179 801 845 76 × 2 = 0 + 0.000 004 805 881 234 359 603 691 52;
  • 50) 0.000 004 805 881 234 359 603 691 52 × 2 = 0 + 0.000 009 611 762 468 719 207 383 04;
  • 51) 0.000 009 611 762 468 719 207 383 04 × 2 = 0 + 0.000 019 223 524 937 438 414 766 08;
  • 52) 0.000 019 223 524 937 438 414 766 08 × 2 = 0 + 0.000 038 447 049 874 876 829 532 16;
  • 53) 0.000 038 447 049 874 876 829 532 16 × 2 = 0 + 0.000 076 894 099 749 753 659 064 32;
  • 54) 0.000 076 894 099 749 753 659 064 32 × 2 = 0 + 0.000 153 788 199 499 507 318 128 64;
  • 55) 0.000 153 788 199 499 507 318 128 64 × 2 = 0 + 0.000 307 576 398 999 014 636 257 28;
  • 56) 0.000 307 576 398 999 014 636 257 28 × 2 = 0 + 0.000 615 152 797 998 029 272 514 56;
  • 57) 0.000 615 152 797 998 029 272 514 56 × 2 = 0 + 0.001 230 305 595 996 058 545 029 12;
  • 58) 0.001 230 305 595 996 058 545 029 12 × 2 = 0 + 0.002 460 611 191 992 117 090 058 24;
  • 59) 0.002 460 611 191 992 117 090 058 24 × 2 = 0 + 0.004 921 222 383 984 234 180 116 48;
  • 60) 0.004 921 222 383 984 234 180 116 48 × 2 = 0 + 0.009 842 444 767 968 468 360 232 96;
  • 61) 0.009 842 444 767 968 468 360 232 96 × 2 = 0 + 0.019 684 889 535 936 936 720 465 92;
  • 62) 0.019 684 889 535 936 936 720 465 92 × 2 = 0 + 0.039 369 779 071 873 873 440 931 84;
  • 63) 0.039 369 779 071 873 873 440 931 84 × 2 = 0 + 0.078 739 558 143 747 746 881 863 68;
  • 64) 0.078 739 558 143 747 746 881 863 68 × 2 = 0 + 0.157 479 116 287 495 493 763 727 36;
  • 65) 0.157 479 116 287 495 493 763 727 36 × 2 = 0 + 0.314 958 232 574 990 987 527 454 72;
  • 66) 0.314 958 232 574 990 987 527 454 72 × 2 = 0 + 0.629 916 465 149 981 975 054 909 44;
  • 67) 0.629 916 465 149 981 975 054 909 44 × 2 = 1 + 0.259 832 930 299 963 950 109 818 88;
  • 68) 0.259 832 930 299 963 950 109 818 88 × 2 = 0 + 0.519 665 860 599 927 900 219 637 76;
  • 69) 0.519 665 860 599 927 900 219 637 76 × 2 = 1 + 0.039 331 721 199 855 800 439 275 52;
  • 70) 0.039 331 721 199 855 800 439 275 52 × 2 = 0 + 0.078 663 442 399 711 600 878 551 04;
  • 71) 0.078 663 442 399 711 600 878 551 04 × 2 = 0 + 0.157 326 884 799 423 201 757 102 08;
  • 72) 0.157 326 884 799 423 201 757 102 08 × 2 = 0 + 0.314 653 769 598 846 403 514 204 16;
  • 73) 0.314 653 769 598 846 403 514 204 16 × 2 = 0 + 0.629 307 539 197 692 807 028 408 32;
  • 74) 0.629 307 539 197 692 807 028 408 32 × 2 = 1 + 0.258 615 078 395 385 614 056 816 64;
  • 75) 0.258 615 078 395 385 614 056 816 64 × 2 = 0 + 0.517 230 156 790 771 228 113 633 28;
  • 76) 0.517 230 156 790 771 228 113 633 28 × 2 = 1 + 0.034 460 313 581 542 456 227 266 56;
  • 77) 0.034 460 313 581 542 456 227 266 56 × 2 = 0 + 0.068 920 627 163 084 912 454 533 12;
  • 78) 0.068 920 627 163 084 912 454 533 12 × 2 = 0 + 0.137 841 254 326 169 824 909 066 24;
  • 79) 0.137 841 254 326 169 824 909 066 24 × 2 = 0 + 0.275 682 508 652 339 649 818 132 48;
  • 80) 0.275 682 508 652 339 649 818 132 48 × 2 = 0 + 0.551 365 017 304 679 299 636 264 96;
  • 81) 0.551 365 017 304 679 299 636 264 96 × 2 = 1 + 0.102 730 034 609 358 599 272 529 92;
  • 82) 0.102 730 034 609 358 599 272 529 92 × 2 = 0 + 0.205 460 069 218 717 198 545 059 84;
  • 83) 0.205 460 069 218 717 198 545 059 84 × 2 = 0 + 0.410 920 138 437 434 397 090 119 68;
  • 84) 0.410 920 138 437 434 397 090 119 68 × 2 = 0 + 0.821 840 276 874 868 794 180 239 36;
  • 85) 0.821 840 276 874 868 794 180 239 36 × 2 = 1 + 0.643 680 553 749 737 588 360 478 72;
  • 86) 0.643 680 553 749 737 588 360 478 72 × 2 = 1 + 0.287 361 107 499 475 176 720 957 44;
  • 87) 0.287 361 107 499 475 176 720 957 44 × 2 = 0 + 0.574 722 214 998 950 353 441 914 88;
  • 88) 0.574 722 214 998 950 353 441 914 88 × 2 = 1 + 0.149 444 429 997 900 706 883 829 76;
  • 89) 0.149 444 429 997 900 706 883 829 76 × 2 = 0 + 0.298 888 859 995 801 413 767 659 52;
  • 90) 0.298 888 859 995 801 413 767 659 52 × 2 = 0 + 0.597 777 719 991 602 827 535 319 04;
  • 91) 0.597 777 719 991 602 827 535 319 04 × 2 = 1 + 0.195 555 439 983 205 655 070 638 08;
  • 92) 0.195 555 439 983 205 655 070 638 08 × 2 = 0 + 0.391 110 879 966 411 310 141 276 16;
  • 93) 0.391 110 879 966 411 310 141 276 16 × 2 = 0 + 0.782 221 759 932 822 620 282 552 32;
  • 94) 0.782 221 759 932 822 620 282 552 32 × 2 = 1 + 0.564 443 519 865 645 240 565 104 64;
  • 95) 0.564 443 519 865 645 240 565 104 64 × 2 = 1 + 0.128 887 039 731 290 481 130 209 28;
  • 96) 0.128 887 039 731 290 481 130 209 28 × 2 = 0 + 0.257 774 079 462 580 962 260 418 56;
  • 97) 0.257 774 079 462 580 962 260 418 56 × 2 = 0 + 0.515 548 158 925 161 924 520 837 12;
  • 98) 0.515 548 158 925 161 924 520 837 12 × 2 = 1 + 0.031 096 317 850 323 849 041 674 24;
  • 99) 0.031 096 317 850 323 849 041 674 24 × 2 = 0 + 0.062 192 635 700 647 698 083 348 48;
  • 100) 0.062 192 635 700 647 698 083 348 48 × 2 = 0 + 0.124 385 271 401 295 396 166 696 96;
  • 101) 0.124 385 271 401 295 396 166 696 96 × 2 = 0 + 0.248 770 542 802 590 792 333 393 92;
  • 102) 0.248 770 542 802 590 792 333 393 92 × 2 = 0 + 0.497 541 085 605 181 584 666 787 84;
  • 103) 0.497 541 085 605 181 584 666 787 84 × 2 = 0 + 0.995 082 171 210 363 169 333 575 68;
  • 104) 0.995 082 171 210 363 169 333 575 68 × 2 = 1 + 0.990 164 342 420 726 338 667 151 36;
  • 105) 0.990 164 342 420 726 338 667 151 36 × 2 = 1 + 0.980 328 684 841 452 677 334 302 72;
  • 106) 0.980 328 684 841 452 677 334 302 72 × 2 = 1 + 0.960 657 369 682 905 354 668 605 44;
  • 107) 0.960 657 369 682 905 354 668 605 44 × 2 = 1 + 0.921 314 739 365 810 709 337 210 88;
  • 108) 0.921 314 739 365 810 709 337 210 88 × 2 = 1 + 0.842 629 478 731 621 418 674 421 76;
  • 109) 0.842 629 478 731 621 418 674 421 76 × 2 = 1 + 0.685 258 957 463 242 837 348 843 52;
  • 110) 0.685 258 957 463 242 837 348 843 52 × 2 = 1 + 0.370 517 914 926 485 674 697 687 04;
  • 111) 0.370 517 914 926 485 674 697 687 04 × 2 = 0 + 0.741 035 829 852 971 349 395 374 08;
  • 112) 0.741 035 829 852 971 349 395 374 08 × 2 = 1 + 0.482 071 659 705 942 698 790 748 16;
  • 113) 0.482 071 659 705 942 698 790 748 16 × 2 = 0 + 0.964 143 319 411 885 397 581 496 32;
  • 114) 0.964 143 319 411 885 397 581 496 32 × 2 = 1 + 0.928 286 638 823 770 795 162 992 64;
  • 115) 0.928 286 638 823 770 795 162 992 64 × 2 = 1 + 0.856 573 277 647 541 590 325 985 28;
  • 116) 0.856 573 277 647 541 590 325 985 28 × 2 = 1 + 0.713 146 555 295 083 180 651 970 56;
  • 117) 0.713 146 555 295 083 180 651 970 56 × 2 = 1 + 0.426 293 110 590 166 361 303 941 12;
  • 118) 0.426 293 110 590 166 361 303 941 12 × 2 = 0 + 0.852 586 221 180 332 722 607 882 24;
  • 119) 0.852 586 221 180 332 722 607 882 24 × 2 = 1 + 0.705 172 442 360 665 445 215 764 48;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 536 96(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1000 1101 0010 0110 0100 0001 1111 1101 0111 101(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 536 96(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1000 1101 0010 0110 0100 0001 1111 1101 0111 101(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 536 96(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1000 1101 0010 0110 0100 0001 1111 1101 0111 101(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1000 1101 0010 0110 0100 0001 1111 1101 0111 101(2) × 20 =


1.0100 0010 1000 0100 0110 1001 0011 0010 0000 1111 1110 1011 1101(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 1000 0100 0110 1001 0011 0010 0000 1111 1110 1011 1101


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 1000 0100 0110 1001 0011 0010 0000 1111 1110 1011 1101 =


0100 0010 1000 0100 0110 1001 0011 0010 0000 1111 1110 1011 1101


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 1000 0100 0110 1001 0011 0010 0000 1111 1110 1011 1101


Decimal number 0.000 000 000 000 000 000 008 536 96 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 1000 0100 0110 1001 0011 0010 0000 1111 1110 1011 1101


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100