0.000 000 000 000 000 000 008 537 34 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 537 34(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 537 34(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 537 34.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 537 34 × 2 = 0 + 0.000 000 000 000 000 000 017 074 68;
  • 2) 0.000 000 000 000 000 000 017 074 68 × 2 = 0 + 0.000 000 000 000 000 000 034 149 36;
  • 3) 0.000 000 000 000 000 000 034 149 36 × 2 = 0 + 0.000 000 000 000 000 000 068 298 72;
  • 4) 0.000 000 000 000 000 000 068 298 72 × 2 = 0 + 0.000 000 000 000 000 000 136 597 44;
  • 5) 0.000 000 000 000 000 000 136 597 44 × 2 = 0 + 0.000 000 000 000 000 000 273 194 88;
  • 6) 0.000 000 000 000 000 000 273 194 88 × 2 = 0 + 0.000 000 000 000 000 000 546 389 76;
  • 7) 0.000 000 000 000 000 000 546 389 76 × 2 = 0 + 0.000 000 000 000 000 001 092 779 52;
  • 8) 0.000 000 000 000 000 001 092 779 52 × 2 = 0 + 0.000 000 000 000 000 002 185 559 04;
  • 9) 0.000 000 000 000 000 002 185 559 04 × 2 = 0 + 0.000 000 000 000 000 004 371 118 08;
  • 10) 0.000 000 000 000 000 004 371 118 08 × 2 = 0 + 0.000 000 000 000 000 008 742 236 16;
  • 11) 0.000 000 000 000 000 008 742 236 16 × 2 = 0 + 0.000 000 000 000 000 017 484 472 32;
  • 12) 0.000 000 000 000 000 017 484 472 32 × 2 = 0 + 0.000 000 000 000 000 034 968 944 64;
  • 13) 0.000 000 000 000 000 034 968 944 64 × 2 = 0 + 0.000 000 000 000 000 069 937 889 28;
  • 14) 0.000 000 000 000 000 069 937 889 28 × 2 = 0 + 0.000 000 000 000 000 139 875 778 56;
  • 15) 0.000 000 000 000 000 139 875 778 56 × 2 = 0 + 0.000 000 000 000 000 279 751 557 12;
  • 16) 0.000 000 000 000 000 279 751 557 12 × 2 = 0 + 0.000 000 000 000 000 559 503 114 24;
  • 17) 0.000 000 000 000 000 559 503 114 24 × 2 = 0 + 0.000 000 000 000 001 119 006 228 48;
  • 18) 0.000 000 000 000 001 119 006 228 48 × 2 = 0 + 0.000 000 000 000 002 238 012 456 96;
  • 19) 0.000 000 000 000 002 238 012 456 96 × 2 = 0 + 0.000 000 000 000 004 476 024 913 92;
  • 20) 0.000 000 000 000 004 476 024 913 92 × 2 = 0 + 0.000 000 000 000 008 952 049 827 84;
  • 21) 0.000 000 000 000 008 952 049 827 84 × 2 = 0 + 0.000 000 000 000 017 904 099 655 68;
  • 22) 0.000 000 000 000 017 904 099 655 68 × 2 = 0 + 0.000 000 000 000 035 808 199 311 36;
  • 23) 0.000 000 000 000 035 808 199 311 36 × 2 = 0 + 0.000 000 000 000 071 616 398 622 72;
  • 24) 0.000 000 000 000 071 616 398 622 72 × 2 = 0 + 0.000 000 000 000 143 232 797 245 44;
  • 25) 0.000 000 000 000 143 232 797 245 44 × 2 = 0 + 0.000 000 000 000 286 465 594 490 88;
  • 26) 0.000 000 000 000 286 465 594 490 88 × 2 = 0 + 0.000 000 000 000 572 931 188 981 76;
  • 27) 0.000 000 000 000 572 931 188 981 76 × 2 = 0 + 0.000 000 000 001 145 862 377 963 52;
  • 28) 0.000 000 000 001 145 862 377 963 52 × 2 = 0 + 0.000 000 000 002 291 724 755 927 04;
  • 29) 0.000 000 000 002 291 724 755 927 04 × 2 = 0 + 0.000 000 000 004 583 449 511 854 08;
  • 30) 0.000 000 000 004 583 449 511 854 08 × 2 = 0 + 0.000 000 000 009 166 899 023 708 16;
  • 31) 0.000 000 000 009 166 899 023 708 16 × 2 = 0 + 0.000 000 000 018 333 798 047 416 32;
  • 32) 0.000 000 000 018 333 798 047 416 32 × 2 = 0 + 0.000 000 000 036 667 596 094 832 64;
  • 33) 0.000 000 000 036 667 596 094 832 64 × 2 = 0 + 0.000 000 000 073 335 192 189 665 28;
  • 34) 0.000 000 000 073 335 192 189 665 28 × 2 = 0 + 0.000 000 000 146 670 384 379 330 56;
  • 35) 0.000 000 000 146 670 384 379 330 56 × 2 = 0 + 0.000 000 000 293 340 768 758 661 12;
  • 36) 0.000 000 000 293 340 768 758 661 12 × 2 = 0 + 0.000 000 000 586 681 537 517 322 24;
  • 37) 0.000 000 000 586 681 537 517 322 24 × 2 = 0 + 0.000 000 001 173 363 075 034 644 48;
  • 38) 0.000 000 001 173 363 075 034 644 48 × 2 = 0 + 0.000 000 002 346 726 150 069 288 96;
  • 39) 0.000 000 002 346 726 150 069 288 96 × 2 = 0 + 0.000 000 004 693 452 300 138 577 92;
  • 40) 0.000 000 004 693 452 300 138 577 92 × 2 = 0 + 0.000 000 009 386 904 600 277 155 84;
  • 41) 0.000 000 009 386 904 600 277 155 84 × 2 = 0 + 0.000 000 018 773 809 200 554 311 68;
  • 42) 0.000 000 018 773 809 200 554 311 68 × 2 = 0 + 0.000 000 037 547 618 401 108 623 36;
  • 43) 0.000 000 037 547 618 401 108 623 36 × 2 = 0 + 0.000 000 075 095 236 802 217 246 72;
  • 44) 0.000 000 075 095 236 802 217 246 72 × 2 = 0 + 0.000 000 150 190 473 604 434 493 44;
  • 45) 0.000 000 150 190 473 604 434 493 44 × 2 = 0 + 0.000 000 300 380 947 208 868 986 88;
  • 46) 0.000 000 300 380 947 208 868 986 88 × 2 = 0 + 0.000 000 600 761 894 417 737 973 76;
  • 47) 0.000 000 600 761 894 417 737 973 76 × 2 = 0 + 0.000 001 201 523 788 835 475 947 52;
  • 48) 0.000 001 201 523 788 835 475 947 52 × 2 = 0 + 0.000 002 403 047 577 670 951 895 04;
  • 49) 0.000 002 403 047 577 670 951 895 04 × 2 = 0 + 0.000 004 806 095 155 341 903 790 08;
  • 50) 0.000 004 806 095 155 341 903 790 08 × 2 = 0 + 0.000 009 612 190 310 683 807 580 16;
  • 51) 0.000 009 612 190 310 683 807 580 16 × 2 = 0 + 0.000 019 224 380 621 367 615 160 32;
  • 52) 0.000 019 224 380 621 367 615 160 32 × 2 = 0 + 0.000 038 448 761 242 735 230 320 64;
  • 53) 0.000 038 448 761 242 735 230 320 64 × 2 = 0 + 0.000 076 897 522 485 470 460 641 28;
  • 54) 0.000 076 897 522 485 470 460 641 28 × 2 = 0 + 0.000 153 795 044 970 940 921 282 56;
  • 55) 0.000 153 795 044 970 940 921 282 56 × 2 = 0 + 0.000 307 590 089 941 881 842 565 12;
  • 56) 0.000 307 590 089 941 881 842 565 12 × 2 = 0 + 0.000 615 180 179 883 763 685 130 24;
  • 57) 0.000 615 180 179 883 763 685 130 24 × 2 = 0 + 0.001 230 360 359 767 527 370 260 48;
  • 58) 0.001 230 360 359 767 527 370 260 48 × 2 = 0 + 0.002 460 720 719 535 054 740 520 96;
  • 59) 0.002 460 720 719 535 054 740 520 96 × 2 = 0 + 0.004 921 441 439 070 109 481 041 92;
  • 60) 0.004 921 441 439 070 109 481 041 92 × 2 = 0 + 0.009 842 882 878 140 218 962 083 84;
  • 61) 0.009 842 882 878 140 218 962 083 84 × 2 = 0 + 0.019 685 765 756 280 437 924 167 68;
  • 62) 0.019 685 765 756 280 437 924 167 68 × 2 = 0 + 0.039 371 531 512 560 875 848 335 36;
  • 63) 0.039 371 531 512 560 875 848 335 36 × 2 = 0 + 0.078 743 063 025 121 751 696 670 72;
  • 64) 0.078 743 063 025 121 751 696 670 72 × 2 = 0 + 0.157 486 126 050 243 503 393 341 44;
  • 65) 0.157 486 126 050 243 503 393 341 44 × 2 = 0 + 0.314 972 252 100 487 006 786 682 88;
  • 66) 0.314 972 252 100 487 006 786 682 88 × 2 = 0 + 0.629 944 504 200 974 013 573 365 76;
  • 67) 0.629 944 504 200 974 013 573 365 76 × 2 = 1 + 0.259 889 008 401 948 027 146 731 52;
  • 68) 0.259 889 008 401 948 027 146 731 52 × 2 = 0 + 0.519 778 016 803 896 054 293 463 04;
  • 69) 0.519 778 016 803 896 054 293 463 04 × 2 = 1 + 0.039 556 033 607 792 108 586 926 08;
  • 70) 0.039 556 033 607 792 108 586 926 08 × 2 = 0 + 0.079 112 067 215 584 217 173 852 16;
  • 71) 0.079 112 067 215 584 217 173 852 16 × 2 = 0 + 0.158 224 134 431 168 434 347 704 32;
  • 72) 0.158 224 134 431 168 434 347 704 32 × 2 = 0 + 0.316 448 268 862 336 868 695 408 64;
  • 73) 0.316 448 268 862 336 868 695 408 64 × 2 = 0 + 0.632 896 537 724 673 737 390 817 28;
  • 74) 0.632 896 537 724 673 737 390 817 28 × 2 = 1 + 0.265 793 075 449 347 474 781 634 56;
  • 75) 0.265 793 075 449 347 474 781 634 56 × 2 = 0 + 0.531 586 150 898 694 949 563 269 12;
  • 76) 0.531 586 150 898 694 949 563 269 12 × 2 = 1 + 0.063 172 301 797 389 899 126 538 24;
  • 77) 0.063 172 301 797 389 899 126 538 24 × 2 = 0 + 0.126 344 603 594 779 798 253 076 48;
  • 78) 0.126 344 603 594 779 798 253 076 48 × 2 = 0 + 0.252 689 207 189 559 596 506 152 96;
  • 79) 0.252 689 207 189 559 596 506 152 96 × 2 = 0 + 0.505 378 414 379 119 193 012 305 92;
  • 80) 0.505 378 414 379 119 193 012 305 92 × 2 = 1 + 0.010 756 828 758 238 386 024 611 84;
  • 81) 0.010 756 828 758 238 386 024 611 84 × 2 = 0 + 0.021 513 657 516 476 772 049 223 68;
  • 82) 0.021 513 657 516 476 772 049 223 68 × 2 = 0 + 0.043 027 315 032 953 544 098 447 36;
  • 83) 0.043 027 315 032 953 544 098 447 36 × 2 = 0 + 0.086 054 630 065 907 088 196 894 72;
  • 84) 0.086 054 630 065 907 088 196 894 72 × 2 = 0 + 0.172 109 260 131 814 176 393 789 44;
  • 85) 0.172 109 260 131 814 176 393 789 44 × 2 = 0 + 0.344 218 520 263 628 352 787 578 88;
  • 86) 0.344 218 520 263 628 352 787 578 88 × 2 = 0 + 0.688 437 040 527 256 705 575 157 76;
  • 87) 0.688 437 040 527 256 705 575 157 76 × 2 = 1 + 0.376 874 081 054 513 411 150 315 52;
  • 88) 0.376 874 081 054 513 411 150 315 52 × 2 = 0 + 0.753 748 162 109 026 822 300 631 04;
  • 89) 0.753 748 162 109 026 822 300 631 04 × 2 = 1 + 0.507 496 324 218 053 644 601 262 08;
  • 90) 0.507 496 324 218 053 644 601 262 08 × 2 = 1 + 0.014 992 648 436 107 289 202 524 16;
  • 91) 0.014 992 648 436 107 289 202 524 16 × 2 = 0 + 0.029 985 296 872 214 578 405 048 32;
  • 92) 0.029 985 296 872 214 578 405 048 32 × 2 = 0 + 0.059 970 593 744 429 156 810 096 64;
  • 93) 0.059 970 593 744 429 156 810 096 64 × 2 = 0 + 0.119 941 187 488 858 313 620 193 28;
  • 94) 0.119 941 187 488 858 313 620 193 28 × 2 = 0 + 0.239 882 374 977 716 627 240 386 56;
  • 95) 0.239 882 374 977 716 627 240 386 56 × 2 = 0 + 0.479 764 749 955 433 254 480 773 12;
  • 96) 0.479 764 749 955 433 254 480 773 12 × 2 = 0 + 0.959 529 499 910 866 508 961 546 24;
  • 97) 0.959 529 499 910 866 508 961 546 24 × 2 = 1 + 0.919 058 999 821 733 017 923 092 48;
  • 98) 0.919 058 999 821 733 017 923 092 48 × 2 = 1 + 0.838 117 999 643 466 035 846 184 96;
  • 99) 0.838 117 999 643 466 035 846 184 96 × 2 = 1 + 0.676 235 999 286 932 071 692 369 92;
  • 100) 0.676 235 999 286 932 071 692 369 92 × 2 = 1 + 0.352 471 998 573 864 143 384 739 84;
  • 101) 0.352 471 998 573 864 143 384 739 84 × 2 = 0 + 0.704 943 997 147 728 286 769 479 68;
  • 102) 0.704 943 997 147 728 286 769 479 68 × 2 = 1 + 0.409 887 994 295 456 573 538 959 36;
  • 103) 0.409 887 994 295 456 573 538 959 36 × 2 = 0 + 0.819 775 988 590 913 147 077 918 72;
  • 104) 0.819 775 988 590 913 147 077 918 72 × 2 = 1 + 0.639 551 977 181 826 294 155 837 44;
  • 105) 0.639 551 977 181 826 294 155 837 44 × 2 = 1 + 0.279 103 954 363 652 588 311 674 88;
  • 106) 0.279 103 954 363 652 588 311 674 88 × 2 = 0 + 0.558 207 908 727 305 176 623 349 76;
  • 107) 0.558 207 908 727 305 176 623 349 76 × 2 = 1 + 0.116 415 817 454 610 353 246 699 52;
  • 108) 0.116 415 817 454 610 353 246 699 52 × 2 = 0 + 0.232 831 634 909 220 706 493 399 04;
  • 109) 0.232 831 634 909 220 706 493 399 04 × 2 = 0 + 0.465 663 269 818 441 412 986 798 08;
  • 110) 0.465 663 269 818 441 412 986 798 08 × 2 = 0 + 0.931 326 539 636 882 825 973 596 16;
  • 111) 0.931 326 539 636 882 825 973 596 16 × 2 = 1 + 0.862 653 079 273 765 651 947 192 32;
  • 112) 0.862 653 079 273 765 651 947 192 32 × 2 = 1 + 0.725 306 158 547 531 303 894 384 64;
  • 113) 0.725 306 158 547 531 303 894 384 64 × 2 = 1 + 0.450 612 317 095 062 607 788 769 28;
  • 114) 0.450 612 317 095 062 607 788 769 28 × 2 = 0 + 0.901 224 634 190 125 215 577 538 56;
  • 115) 0.901 224 634 190 125 215 577 538 56 × 2 = 1 + 0.802 449 268 380 250 431 155 077 12;
  • 116) 0.802 449 268 380 250 431 155 077 12 × 2 = 1 + 0.604 898 536 760 500 862 310 154 24;
  • 117) 0.604 898 536 760 500 862 310 154 24 × 2 = 1 + 0.209 797 073 521 001 724 620 308 48;
  • 118) 0.209 797 073 521 001 724 620 308 48 × 2 = 0 + 0.419 594 147 042 003 449 240 616 96;
  • 119) 0.419 594 147 042 003 449 240 616 96 × 2 = 0 + 0.839 188 294 084 006 898 481 233 92;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 537 34(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0001 0000 0010 1100 0000 1111 0101 1010 0011 1011 100(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 537 34(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0001 0000 0010 1100 0000 1111 0101 1010 0011 1011 100(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 537 34(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0001 0000 0010 1100 0000 1111 0101 1010 0011 1011 100(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0001 0000 0010 1100 0000 1111 0101 1010 0011 1011 100(2) × 20 =


1.0100 0010 1000 1000 0001 0110 0000 0111 1010 1101 0001 1101 1100(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 1000 1000 0001 0110 0000 0111 1010 1101 0001 1101 1100


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 1000 1000 0001 0110 0000 0111 1010 1101 0001 1101 1100 =


0100 0010 1000 1000 0001 0110 0000 0111 1010 1101 0001 1101 1100


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 1000 1000 0001 0110 0000 0111 1010 1101 0001 1101 1100


Decimal number 0.000 000 000 000 000 000 008 537 34 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 1000 1000 0001 0110 0000 0111 1010 1101 0001 1101 1100


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100