0.000 000 000 000 000 000 008 536 996 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 536 996(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 536 996(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 536 996.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 536 996 × 2 = 0 + 0.000 000 000 000 000 000 017 073 992;
  • 2) 0.000 000 000 000 000 000 017 073 992 × 2 = 0 + 0.000 000 000 000 000 000 034 147 984;
  • 3) 0.000 000 000 000 000 000 034 147 984 × 2 = 0 + 0.000 000 000 000 000 000 068 295 968;
  • 4) 0.000 000 000 000 000 000 068 295 968 × 2 = 0 + 0.000 000 000 000 000 000 136 591 936;
  • 5) 0.000 000 000 000 000 000 136 591 936 × 2 = 0 + 0.000 000 000 000 000 000 273 183 872;
  • 6) 0.000 000 000 000 000 000 273 183 872 × 2 = 0 + 0.000 000 000 000 000 000 546 367 744;
  • 7) 0.000 000 000 000 000 000 546 367 744 × 2 = 0 + 0.000 000 000 000 000 001 092 735 488;
  • 8) 0.000 000 000 000 000 001 092 735 488 × 2 = 0 + 0.000 000 000 000 000 002 185 470 976;
  • 9) 0.000 000 000 000 000 002 185 470 976 × 2 = 0 + 0.000 000 000 000 000 004 370 941 952;
  • 10) 0.000 000 000 000 000 004 370 941 952 × 2 = 0 + 0.000 000 000 000 000 008 741 883 904;
  • 11) 0.000 000 000 000 000 008 741 883 904 × 2 = 0 + 0.000 000 000 000 000 017 483 767 808;
  • 12) 0.000 000 000 000 000 017 483 767 808 × 2 = 0 + 0.000 000 000 000 000 034 967 535 616;
  • 13) 0.000 000 000 000 000 034 967 535 616 × 2 = 0 + 0.000 000 000 000 000 069 935 071 232;
  • 14) 0.000 000 000 000 000 069 935 071 232 × 2 = 0 + 0.000 000 000 000 000 139 870 142 464;
  • 15) 0.000 000 000 000 000 139 870 142 464 × 2 = 0 + 0.000 000 000 000 000 279 740 284 928;
  • 16) 0.000 000 000 000 000 279 740 284 928 × 2 = 0 + 0.000 000 000 000 000 559 480 569 856;
  • 17) 0.000 000 000 000 000 559 480 569 856 × 2 = 0 + 0.000 000 000 000 001 118 961 139 712;
  • 18) 0.000 000 000 000 001 118 961 139 712 × 2 = 0 + 0.000 000 000 000 002 237 922 279 424;
  • 19) 0.000 000 000 000 002 237 922 279 424 × 2 = 0 + 0.000 000 000 000 004 475 844 558 848;
  • 20) 0.000 000 000 000 004 475 844 558 848 × 2 = 0 + 0.000 000 000 000 008 951 689 117 696;
  • 21) 0.000 000 000 000 008 951 689 117 696 × 2 = 0 + 0.000 000 000 000 017 903 378 235 392;
  • 22) 0.000 000 000 000 017 903 378 235 392 × 2 = 0 + 0.000 000 000 000 035 806 756 470 784;
  • 23) 0.000 000 000 000 035 806 756 470 784 × 2 = 0 + 0.000 000 000 000 071 613 512 941 568;
  • 24) 0.000 000 000 000 071 613 512 941 568 × 2 = 0 + 0.000 000 000 000 143 227 025 883 136;
  • 25) 0.000 000 000 000 143 227 025 883 136 × 2 = 0 + 0.000 000 000 000 286 454 051 766 272;
  • 26) 0.000 000 000 000 286 454 051 766 272 × 2 = 0 + 0.000 000 000 000 572 908 103 532 544;
  • 27) 0.000 000 000 000 572 908 103 532 544 × 2 = 0 + 0.000 000 000 001 145 816 207 065 088;
  • 28) 0.000 000 000 001 145 816 207 065 088 × 2 = 0 + 0.000 000 000 002 291 632 414 130 176;
  • 29) 0.000 000 000 002 291 632 414 130 176 × 2 = 0 + 0.000 000 000 004 583 264 828 260 352;
  • 30) 0.000 000 000 004 583 264 828 260 352 × 2 = 0 + 0.000 000 000 009 166 529 656 520 704;
  • 31) 0.000 000 000 009 166 529 656 520 704 × 2 = 0 + 0.000 000 000 018 333 059 313 041 408;
  • 32) 0.000 000 000 018 333 059 313 041 408 × 2 = 0 + 0.000 000 000 036 666 118 626 082 816;
  • 33) 0.000 000 000 036 666 118 626 082 816 × 2 = 0 + 0.000 000 000 073 332 237 252 165 632;
  • 34) 0.000 000 000 073 332 237 252 165 632 × 2 = 0 + 0.000 000 000 146 664 474 504 331 264;
  • 35) 0.000 000 000 146 664 474 504 331 264 × 2 = 0 + 0.000 000 000 293 328 949 008 662 528;
  • 36) 0.000 000 000 293 328 949 008 662 528 × 2 = 0 + 0.000 000 000 586 657 898 017 325 056;
  • 37) 0.000 000 000 586 657 898 017 325 056 × 2 = 0 + 0.000 000 001 173 315 796 034 650 112;
  • 38) 0.000 000 001 173 315 796 034 650 112 × 2 = 0 + 0.000 000 002 346 631 592 069 300 224;
  • 39) 0.000 000 002 346 631 592 069 300 224 × 2 = 0 + 0.000 000 004 693 263 184 138 600 448;
  • 40) 0.000 000 004 693 263 184 138 600 448 × 2 = 0 + 0.000 000 009 386 526 368 277 200 896;
  • 41) 0.000 000 009 386 526 368 277 200 896 × 2 = 0 + 0.000 000 018 773 052 736 554 401 792;
  • 42) 0.000 000 018 773 052 736 554 401 792 × 2 = 0 + 0.000 000 037 546 105 473 108 803 584;
  • 43) 0.000 000 037 546 105 473 108 803 584 × 2 = 0 + 0.000 000 075 092 210 946 217 607 168;
  • 44) 0.000 000 075 092 210 946 217 607 168 × 2 = 0 + 0.000 000 150 184 421 892 435 214 336;
  • 45) 0.000 000 150 184 421 892 435 214 336 × 2 = 0 + 0.000 000 300 368 843 784 870 428 672;
  • 46) 0.000 000 300 368 843 784 870 428 672 × 2 = 0 + 0.000 000 600 737 687 569 740 857 344;
  • 47) 0.000 000 600 737 687 569 740 857 344 × 2 = 0 + 0.000 001 201 475 375 139 481 714 688;
  • 48) 0.000 001 201 475 375 139 481 714 688 × 2 = 0 + 0.000 002 402 950 750 278 963 429 376;
  • 49) 0.000 002 402 950 750 278 963 429 376 × 2 = 0 + 0.000 004 805 901 500 557 926 858 752;
  • 50) 0.000 004 805 901 500 557 926 858 752 × 2 = 0 + 0.000 009 611 803 001 115 853 717 504;
  • 51) 0.000 009 611 803 001 115 853 717 504 × 2 = 0 + 0.000 019 223 606 002 231 707 435 008;
  • 52) 0.000 019 223 606 002 231 707 435 008 × 2 = 0 + 0.000 038 447 212 004 463 414 870 016;
  • 53) 0.000 038 447 212 004 463 414 870 016 × 2 = 0 + 0.000 076 894 424 008 926 829 740 032;
  • 54) 0.000 076 894 424 008 926 829 740 032 × 2 = 0 + 0.000 153 788 848 017 853 659 480 064;
  • 55) 0.000 153 788 848 017 853 659 480 064 × 2 = 0 + 0.000 307 577 696 035 707 318 960 128;
  • 56) 0.000 307 577 696 035 707 318 960 128 × 2 = 0 + 0.000 615 155 392 071 414 637 920 256;
  • 57) 0.000 615 155 392 071 414 637 920 256 × 2 = 0 + 0.001 230 310 784 142 829 275 840 512;
  • 58) 0.001 230 310 784 142 829 275 840 512 × 2 = 0 + 0.002 460 621 568 285 658 551 681 024;
  • 59) 0.002 460 621 568 285 658 551 681 024 × 2 = 0 + 0.004 921 243 136 571 317 103 362 048;
  • 60) 0.004 921 243 136 571 317 103 362 048 × 2 = 0 + 0.009 842 486 273 142 634 206 724 096;
  • 61) 0.009 842 486 273 142 634 206 724 096 × 2 = 0 + 0.019 684 972 546 285 268 413 448 192;
  • 62) 0.019 684 972 546 285 268 413 448 192 × 2 = 0 + 0.039 369 945 092 570 536 826 896 384;
  • 63) 0.039 369 945 092 570 536 826 896 384 × 2 = 0 + 0.078 739 890 185 141 073 653 792 768;
  • 64) 0.078 739 890 185 141 073 653 792 768 × 2 = 0 + 0.157 479 780 370 282 147 307 585 536;
  • 65) 0.157 479 780 370 282 147 307 585 536 × 2 = 0 + 0.314 959 560 740 564 294 615 171 072;
  • 66) 0.314 959 560 740 564 294 615 171 072 × 2 = 0 + 0.629 919 121 481 128 589 230 342 144;
  • 67) 0.629 919 121 481 128 589 230 342 144 × 2 = 1 + 0.259 838 242 962 257 178 460 684 288;
  • 68) 0.259 838 242 962 257 178 460 684 288 × 2 = 0 + 0.519 676 485 924 514 356 921 368 576;
  • 69) 0.519 676 485 924 514 356 921 368 576 × 2 = 1 + 0.039 352 971 849 028 713 842 737 152;
  • 70) 0.039 352 971 849 028 713 842 737 152 × 2 = 0 + 0.078 705 943 698 057 427 685 474 304;
  • 71) 0.078 705 943 698 057 427 685 474 304 × 2 = 0 + 0.157 411 887 396 114 855 370 948 608;
  • 72) 0.157 411 887 396 114 855 370 948 608 × 2 = 0 + 0.314 823 774 792 229 710 741 897 216;
  • 73) 0.314 823 774 792 229 710 741 897 216 × 2 = 0 + 0.629 647 549 584 459 421 483 794 432;
  • 74) 0.629 647 549 584 459 421 483 794 432 × 2 = 1 + 0.259 295 099 168 918 842 967 588 864;
  • 75) 0.259 295 099 168 918 842 967 588 864 × 2 = 0 + 0.518 590 198 337 837 685 935 177 728;
  • 76) 0.518 590 198 337 837 685 935 177 728 × 2 = 1 + 0.037 180 396 675 675 371 870 355 456;
  • 77) 0.037 180 396 675 675 371 870 355 456 × 2 = 0 + 0.074 360 793 351 350 743 740 710 912;
  • 78) 0.074 360 793 351 350 743 740 710 912 × 2 = 0 + 0.148 721 586 702 701 487 481 421 824;
  • 79) 0.148 721 586 702 701 487 481 421 824 × 2 = 0 + 0.297 443 173 405 402 974 962 843 648;
  • 80) 0.297 443 173 405 402 974 962 843 648 × 2 = 0 + 0.594 886 346 810 805 949 925 687 296;
  • 81) 0.594 886 346 810 805 949 925 687 296 × 2 = 1 + 0.189 772 693 621 611 899 851 374 592;
  • 82) 0.189 772 693 621 611 899 851 374 592 × 2 = 0 + 0.379 545 387 243 223 799 702 749 184;
  • 83) 0.379 545 387 243 223 799 702 749 184 × 2 = 0 + 0.759 090 774 486 447 599 405 498 368;
  • 84) 0.759 090 774 486 447 599 405 498 368 × 2 = 1 + 0.518 181 548 972 895 198 810 996 736;
  • 85) 0.518 181 548 972 895 198 810 996 736 × 2 = 1 + 0.036 363 097 945 790 397 621 993 472;
  • 86) 0.036 363 097 945 790 397 621 993 472 × 2 = 0 + 0.072 726 195 891 580 795 243 986 944;
  • 87) 0.072 726 195 891 580 795 243 986 944 × 2 = 0 + 0.145 452 391 783 161 590 487 973 888;
  • 88) 0.145 452 391 783 161 590 487 973 888 × 2 = 0 + 0.290 904 783 566 323 180 975 947 776;
  • 89) 0.290 904 783 566 323 180 975 947 776 × 2 = 0 + 0.581 809 567 132 646 361 951 895 552;
  • 90) 0.581 809 567 132 646 361 951 895 552 × 2 = 1 + 0.163 619 134 265 292 723 903 791 104;
  • 91) 0.163 619 134 265 292 723 903 791 104 × 2 = 0 + 0.327 238 268 530 585 447 807 582 208;
  • 92) 0.327 238 268 530 585 447 807 582 208 × 2 = 0 + 0.654 476 537 061 170 895 615 164 416;
  • 93) 0.654 476 537 061 170 895 615 164 416 × 2 = 1 + 0.308 953 074 122 341 791 230 328 832;
  • 94) 0.308 953 074 122 341 791 230 328 832 × 2 = 0 + 0.617 906 148 244 683 582 460 657 664;
  • 95) 0.617 906 148 244 683 582 460 657 664 × 2 = 1 + 0.235 812 296 489 367 164 921 315 328;
  • 96) 0.235 812 296 489 367 164 921 315 328 × 2 = 0 + 0.471 624 592 978 734 329 842 630 656;
  • 97) 0.471 624 592 978 734 329 842 630 656 × 2 = 0 + 0.943 249 185 957 468 659 685 261 312;
  • 98) 0.943 249 185 957 468 659 685 261 312 × 2 = 1 + 0.886 498 371 914 937 319 370 522 624;
  • 99) 0.886 498 371 914 937 319 370 522 624 × 2 = 1 + 0.772 996 743 829 874 638 741 045 248;
  • 100) 0.772 996 743 829 874 638 741 045 248 × 2 = 1 + 0.545 993 487 659 749 277 482 090 496;
  • 101) 0.545 993 487 659 749 277 482 090 496 × 2 = 1 + 0.091 986 975 319 498 554 964 180 992;
  • 102) 0.091 986 975 319 498 554 964 180 992 × 2 = 0 + 0.183 973 950 638 997 109 928 361 984;
  • 103) 0.183 973 950 638 997 109 928 361 984 × 2 = 0 + 0.367 947 901 277 994 219 856 723 968;
  • 104) 0.367 947 901 277 994 219 856 723 968 × 2 = 0 + 0.735 895 802 555 988 439 713 447 936;
  • 105) 0.735 895 802 555 988 439 713 447 936 × 2 = 1 + 0.471 791 605 111 976 879 426 895 872;
  • 106) 0.471 791 605 111 976 879 426 895 872 × 2 = 0 + 0.943 583 210 223 953 758 853 791 744;
  • 107) 0.943 583 210 223 953 758 853 791 744 × 2 = 1 + 0.887 166 420 447 907 517 707 583 488;
  • 108) 0.887 166 420 447 907 517 707 583 488 × 2 = 1 + 0.774 332 840 895 815 035 415 166 976;
  • 109) 0.774 332 840 895 815 035 415 166 976 × 2 = 1 + 0.548 665 681 791 630 070 830 333 952;
  • 110) 0.548 665 681 791 630 070 830 333 952 × 2 = 1 + 0.097 331 363 583 260 141 660 667 904;
  • 111) 0.097 331 363 583 260 141 660 667 904 × 2 = 0 + 0.194 662 727 166 520 283 321 335 808;
  • 112) 0.194 662 727 166 520 283 321 335 808 × 2 = 0 + 0.389 325 454 333 040 566 642 671 616;
  • 113) 0.389 325 454 333 040 566 642 671 616 × 2 = 0 + 0.778 650 908 666 081 133 285 343 232;
  • 114) 0.778 650 908 666 081 133 285 343 232 × 2 = 1 + 0.557 301 817 332 162 266 570 686 464;
  • 115) 0.557 301 817 332 162 266 570 686 464 × 2 = 1 + 0.114 603 634 664 324 533 141 372 928;
  • 116) 0.114 603 634 664 324 533 141 372 928 × 2 = 0 + 0.229 207 269 328 649 066 282 745 856;
  • 117) 0.229 207 269 328 649 066 282 745 856 × 2 = 0 + 0.458 414 538 657 298 132 565 491 712;
  • 118) 0.458 414 538 657 298 132 565 491 712 × 2 = 0 + 0.916 829 077 314 596 265 130 983 424;
  • 119) 0.916 829 077 314 596 265 130 983 424 × 2 = 1 + 0.833 658 154 629 192 530 261 966 848;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 536 996(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1001 1000 0100 1010 0111 1000 1011 1100 0110 001(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 536 996(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1001 1000 0100 1010 0111 1000 1011 1100 0110 001(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 536 996(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1001 1000 0100 1010 0111 1000 1011 1100 0110 001(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1001 1000 0100 1010 0111 1000 1011 1100 0110 001(2) × 20 =


1.0100 0010 1000 0100 1100 0010 0101 0011 1100 0101 1110 0011 0001(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 1000 0100 1100 0010 0101 0011 1100 0101 1110 0011 0001


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 1000 0100 1100 0010 0101 0011 1100 0101 1110 0011 0001 =


0100 0010 1000 0100 1100 0010 0101 0011 1100 0101 1110 0011 0001


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 1000 0100 1100 0010 0101 0011 1100 0101 1110 0011 0001


Decimal number 0.000 000 000 000 000 000 008 536 996 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 1000 0100 1100 0010 0101 0011 1100 0101 1110 0011 0001


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100