0.000 000 000 000 000 000 008 536 68 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 536 68(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 536 68(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 536 68.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 536 68 × 2 = 0 + 0.000 000 000 000 000 000 017 073 36;
  • 2) 0.000 000 000 000 000 000 017 073 36 × 2 = 0 + 0.000 000 000 000 000 000 034 146 72;
  • 3) 0.000 000 000 000 000 000 034 146 72 × 2 = 0 + 0.000 000 000 000 000 000 068 293 44;
  • 4) 0.000 000 000 000 000 000 068 293 44 × 2 = 0 + 0.000 000 000 000 000 000 136 586 88;
  • 5) 0.000 000 000 000 000 000 136 586 88 × 2 = 0 + 0.000 000 000 000 000 000 273 173 76;
  • 6) 0.000 000 000 000 000 000 273 173 76 × 2 = 0 + 0.000 000 000 000 000 000 546 347 52;
  • 7) 0.000 000 000 000 000 000 546 347 52 × 2 = 0 + 0.000 000 000 000 000 001 092 695 04;
  • 8) 0.000 000 000 000 000 001 092 695 04 × 2 = 0 + 0.000 000 000 000 000 002 185 390 08;
  • 9) 0.000 000 000 000 000 002 185 390 08 × 2 = 0 + 0.000 000 000 000 000 004 370 780 16;
  • 10) 0.000 000 000 000 000 004 370 780 16 × 2 = 0 + 0.000 000 000 000 000 008 741 560 32;
  • 11) 0.000 000 000 000 000 008 741 560 32 × 2 = 0 + 0.000 000 000 000 000 017 483 120 64;
  • 12) 0.000 000 000 000 000 017 483 120 64 × 2 = 0 + 0.000 000 000 000 000 034 966 241 28;
  • 13) 0.000 000 000 000 000 034 966 241 28 × 2 = 0 + 0.000 000 000 000 000 069 932 482 56;
  • 14) 0.000 000 000 000 000 069 932 482 56 × 2 = 0 + 0.000 000 000 000 000 139 864 965 12;
  • 15) 0.000 000 000 000 000 139 864 965 12 × 2 = 0 + 0.000 000 000 000 000 279 729 930 24;
  • 16) 0.000 000 000 000 000 279 729 930 24 × 2 = 0 + 0.000 000 000 000 000 559 459 860 48;
  • 17) 0.000 000 000 000 000 559 459 860 48 × 2 = 0 + 0.000 000 000 000 001 118 919 720 96;
  • 18) 0.000 000 000 000 001 118 919 720 96 × 2 = 0 + 0.000 000 000 000 002 237 839 441 92;
  • 19) 0.000 000 000 000 002 237 839 441 92 × 2 = 0 + 0.000 000 000 000 004 475 678 883 84;
  • 20) 0.000 000 000 000 004 475 678 883 84 × 2 = 0 + 0.000 000 000 000 008 951 357 767 68;
  • 21) 0.000 000 000 000 008 951 357 767 68 × 2 = 0 + 0.000 000 000 000 017 902 715 535 36;
  • 22) 0.000 000 000 000 017 902 715 535 36 × 2 = 0 + 0.000 000 000 000 035 805 431 070 72;
  • 23) 0.000 000 000 000 035 805 431 070 72 × 2 = 0 + 0.000 000 000 000 071 610 862 141 44;
  • 24) 0.000 000 000 000 071 610 862 141 44 × 2 = 0 + 0.000 000 000 000 143 221 724 282 88;
  • 25) 0.000 000 000 000 143 221 724 282 88 × 2 = 0 + 0.000 000 000 000 286 443 448 565 76;
  • 26) 0.000 000 000 000 286 443 448 565 76 × 2 = 0 + 0.000 000 000 000 572 886 897 131 52;
  • 27) 0.000 000 000 000 572 886 897 131 52 × 2 = 0 + 0.000 000 000 001 145 773 794 263 04;
  • 28) 0.000 000 000 001 145 773 794 263 04 × 2 = 0 + 0.000 000 000 002 291 547 588 526 08;
  • 29) 0.000 000 000 002 291 547 588 526 08 × 2 = 0 + 0.000 000 000 004 583 095 177 052 16;
  • 30) 0.000 000 000 004 583 095 177 052 16 × 2 = 0 + 0.000 000 000 009 166 190 354 104 32;
  • 31) 0.000 000 000 009 166 190 354 104 32 × 2 = 0 + 0.000 000 000 018 332 380 708 208 64;
  • 32) 0.000 000 000 018 332 380 708 208 64 × 2 = 0 + 0.000 000 000 036 664 761 416 417 28;
  • 33) 0.000 000 000 036 664 761 416 417 28 × 2 = 0 + 0.000 000 000 073 329 522 832 834 56;
  • 34) 0.000 000 000 073 329 522 832 834 56 × 2 = 0 + 0.000 000 000 146 659 045 665 669 12;
  • 35) 0.000 000 000 146 659 045 665 669 12 × 2 = 0 + 0.000 000 000 293 318 091 331 338 24;
  • 36) 0.000 000 000 293 318 091 331 338 24 × 2 = 0 + 0.000 000 000 586 636 182 662 676 48;
  • 37) 0.000 000 000 586 636 182 662 676 48 × 2 = 0 + 0.000 000 001 173 272 365 325 352 96;
  • 38) 0.000 000 001 173 272 365 325 352 96 × 2 = 0 + 0.000 000 002 346 544 730 650 705 92;
  • 39) 0.000 000 002 346 544 730 650 705 92 × 2 = 0 + 0.000 000 004 693 089 461 301 411 84;
  • 40) 0.000 000 004 693 089 461 301 411 84 × 2 = 0 + 0.000 000 009 386 178 922 602 823 68;
  • 41) 0.000 000 009 386 178 922 602 823 68 × 2 = 0 + 0.000 000 018 772 357 845 205 647 36;
  • 42) 0.000 000 018 772 357 845 205 647 36 × 2 = 0 + 0.000 000 037 544 715 690 411 294 72;
  • 43) 0.000 000 037 544 715 690 411 294 72 × 2 = 0 + 0.000 000 075 089 431 380 822 589 44;
  • 44) 0.000 000 075 089 431 380 822 589 44 × 2 = 0 + 0.000 000 150 178 862 761 645 178 88;
  • 45) 0.000 000 150 178 862 761 645 178 88 × 2 = 0 + 0.000 000 300 357 725 523 290 357 76;
  • 46) 0.000 000 300 357 725 523 290 357 76 × 2 = 0 + 0.000 000 600 715 451 046 580 715 52;
  • 47) 0.000 000 600 715 451 046 580 715 52 × 2 = 0 + 0.000 001 201 430 902 093 161 431 04;
  • 48) 0.000 001 201 430 902 093 161 431 04 × 2 = 0 + 0.000 002 402 861 804 186 322 862 08;
  • 49) 0.000 002 402 861 804 186 322 862 08 × 2 = 0 + 0.000 004 805 723 608 372 645 724 16;
  • 50) 0.000 004 805 723 608 372 645 724 16 × 2 = 0 + 0.000 009 611 447 216 745 291 448 32;
  • 51) 0.000 009 611 447 216 745 291 448 32 × 2 = 0 + 0.000 019 222 894 433 490 582 896 64;
  • 52) 0.000 019 222 894 433 490 582 896 64 × 2 = 0 + 0.000 038 445 788 866 981 165 793 28;
  • 53) 0.000 038 445 788 866 981 165 793 28 × 2 = 0 + 0.000 076 891 577 733 962 331 586 56;
  • 54) 0.000 076 891 577 733 962 331 586 56 × 2 = 0 + 0.000 153 783 155 467 924 663 173 12;
  • 55) 0.000 153 783 155 467 924 663 173 12 × 2 = 0 + 0.000 307 566 310 935 849 326 346 24;
  • 56) 0.000 307 566 310 935 849 326 346 24 × 2 = 0 + 0.000 615 132 621 871 698 652 692 48;
  • 57) 0.000 615 132 621 871 698 652 692 48 × 2 = 0 + 0.001 230 265 243 743 397 305 384 96;
  • 58) 0.001 230 265 243 743 397 305 384 96 × 2 = 0 + 0.002 460 530 487 486 794 610 769 92;
  • 59) 0.002 460 530 487 486 794 610 769 92 × 2 = 0 + 0.004 921 060 974 973 589 221 539 84;
  • 60) 0.004 921 060 974 973 589 221 539 84 × 2 = 0 + 0.009 842 121 949 947 178 443 079 68;
  • 61) 0.009 842 121 949 947 178 443 079 68 × 2 = 0 + 0.019 684 243 899 894 356 886 159 36;
  • 62) 0.019 684 243 899 894 356 886 159 36 × 2 = 0 + 0.039 368 487 799 788 713 772 318 72;
  • 63) 0.039 368 487 799 788 713 772 318 72 × 2 = 0 + 0.078 736 975 599 577 427 544 637 44;
  • 64) 0.078 736 975 599 577 427 544 637 44 × 2 = 0 + 0.157 473 951 199 154 855 089 274 88;
  • 65) 0.157 473 951 199 154 855 089 274 88 × 2 = 0 + 0.314 947 902 398 309 710 178 549 76;
  • 66) 0.314 947 902 398 309 710 178 549 76 × 2 = 0 + 0.629 895 804 796 619 420 357 099 52;
  • 67) 0.629 895 804 796 619 420 357 099 52 × 2 = 1 + 0.259 791 609 593 238 840 714 199 04;
  • 68) 0.259 791 609 593 238 840 714 199 04 × 2 = 0 + 0.519 583 219 186 477 681 428 398 08;
  • 69) 0.519 583 219 186 477 681 428 398 08 × 2 = 1 + 0.039 166 438 372 955 362 856 796 16;
  • 70) 0.039 166 438 372 955 362 856 796 16 × 2 = 0 + 0.078 332 876 745 910 725 713 592 32;
  • 71) 0.078 332 876 745 910 725 713 592 32 × 2 = 0 + 0.156 665 753 491 821 451 427 184 64;
  • 72) 0.156 665 753 491 821 451 427 184 64 × 2 = 0 + 0.313 331 506 983 642 902 854 369 28;
  • 73) 0.313 331 506 983 642 902 854 369 28 × 2 = 0 + 0.626 663 013 967 285 805 708 738 56;
  • 74) 0.626 663 013 967 285 805 708 738 56 × 2 = 1 + 0.253 326 027 934 571 611 417 477 12;
  • 75) 0.253 326 027 934 571 611 417 477 12 × 2 = 0 + 0.506 652 055 869 143 222 834 954 24;
  • 76) 0.506 652 055 869 143 222 834 954 24 × 2 = 1 + 0.013 304 111 738 286 445 669 908 48;
  • 77) 0.013 304 111 738 286 445 669 908 48 × 2 = 0 + 0.026 608 223 476 572 891 339 816 96;
  • 78) 0.026 608 223 476 572 891 339 816 96 × 2 = 0 + 0.053 216 446 953 145 782 679 633 92;
  • 79) 0.053 216 446 953 145 782 679 633 92 × 2 = 0 + 0.106 432 893 906 291 565 359 267 84;
  • 80) 0.106 432 893 906 291 565 359 267 84 × 2 = 0 + 0.212 865 787 812 583 130 718 535 68;
  • 81) 0.212 865 787 812 583 130 718 535 68 × 2 = 0 + 0.425 731 575 625 166 261 437 071 36;
  • 82) 0.425 731 575 625 166 261 437 071 36 × 2 = 0 + 0.851 463 151 250 332 522 874 142 72;
  • 83) 0.851 463 151 250 332 522 874 142 72 × 2 = 1 + 0.702 926 302 500 665 045 748 285 44;
  • 84) 0.702 926 302 500 665 045 748 285 44 × 2 = 1 + 0.405 852 605 001 330 091 496 570 88;
  • 85) 0.405 852 605 001 330 091 496 570 88 × 2 = 0 + 0.811 705 210 002 660 182 993 141 76;
  • 86) 0.811 705 210 002 660 182 993 141 76 × 2 = 1 + 0.623 410 420 005 320 365 986 283 52;
  • 87) 0.623 410 420 005 320 365 986 283 52 × 2 = 1 + 0.246 820 840 010 640 731 972 567 04;
  • 88) 0.246 820 840 010 640 731 972 567 04 × 2 = 0 + 0.493 641 680 021 281 463 945 134 08;
  • 89) 0.493 641 680 021 281 463 945 134 08 × 2 = 0 + 0.987 283 360 042 562 927 890 268 16;
  • 90) 0.987 283 360 042 562 927 890 268 16 × 2 = 1 + 0.974 566 720 085 125 855 780 536 32;
  • 91) 0.974 566 720 085 125 855 780 536 32 × 2 = 1 + 0.949 133 440 170 251 711 561 072 64;
  • 92) 0.949 133 440 170 251 711 561 072 64 × 2 = 1 + 0.898 266 880 340 503 423 122 145 28;
  • 93) 0.898 266 880 340 503 423 122 145 28 × 2 = 1 + 0.796 533 760 681 006 846 244 290 56;
  • 94) 0.796 533 760 681 006 846 244 290 56 × 2 = 1 + 0.593 067 521 362 013 692 488 581 12;
  • 95) 0.593 067 521 362 013 692 488 581 12 × 2 = 1 + 0.186 135 042 724 027 384 977 162 24;
  • 96) 0.186 135 042 724 027 384 977 162 24 × 2 = 0 + 0.372 270 085 448 054 769 954 324 48;
  • 97) 0.372 270 085 448 054 769 954 324 48 × 2 = 0 + 0.744 540 170 896 109 539 908 648 96;
  • 98) 0.744 540 170 896 109 539 908 648 96 × 2 = 1 + 0.489 080 341 792 219 079 817 297 92;
  • 99) 0.489 080 341 792 219 079 817 297 92 × 2 = 0 + 0.978 160 683 584 438 159 634 595 84;
  • 100) 0.978 160 683 584 438 159 634 595 84 × 2 = 1 + 0.956 321 367 168 876 319 269 191 68;
  • 101) 0.956 321 367 168 876 319 269 191 68 × 2 = 1 + 0.912 642 734 337 752 638 538 383 36;
  • 102) 0.912 642 734 337 752 638 538 383 36 × 2 = 1 + 0.825 285 468 675 505 277 076 766 72;
  • 103) 0.825 285 468 675 505 277 076 766 72 × 2 = 1 + 0.650 570 937 351 010 554 153 533 44;
  • 104) 0.650 570 937 351 010 554 153 533 44 × 2 = 1 + 0.301 141 874 702 021 108 307 066 88;
  • 105) 0.301 141 874 702 021 108 307 066 88 × 2 = 0 + 0.602 283 749 404 042 216 614 133 76;
  • 106) 0.602 283 749 404 042 216 614 133 76 × 2 = 1 + 0.204 567 498 808 084 433 228 267 52;
  • 107) 0.204 567 498 808 084 433 228 267 52 × 2 = 0 + 0.409 134 997 616 168 866 456 535 04;
  • 108) 0.409 134 997 616 168 866 456 535 04 × 2 = 0 + 0.818 269 995 232 337 732 913 070 08;
  • 109) 0.818 269 995 232 337 732 913 070 08 × 2 = 1 + 0.636 539 990 464 675 465 826 140 16;
  • 110) 0.636 539 990 464 675 465 826 140 16 × 2 = 1 + 0.273 079 980 929 350 931 652 280 32;
  • 111) 0.273 079 980 929 350 931 652 280 32 × 2 = 0 + 0.546 159 961 858 701 863 304 560 64;
  • 112) 0.546 159 961 858 701 863 304 560 64 × 2 = 1 + 0.092 319 923 717 403 726 609 121 28;
  • 113) 0.092 319 923 717 403 726 609 121 28 × 2 = 0 + 0.184 639 847 434 807 453 218 242 56;
  • 114) 0.184 639 847 434 807 453 218 242 56 × 2 = 0 + 0.369 279 694 869 614 906 436 485 12;
  • 115) 0.369 279 694 869 614 906 436 485 12 × 2 = 0 + 0.738 559 389 739 229 812 872 970 24;
  • 116) 0.738 559 389 739 229 812 872 970 24 × 2 = 1 + 0.477 118 779 478 459 625 745 940 48;
  • 117) 0.477 118 779 478 459 625 745 940 48 × 2 = 0 + 0.954 237 558 956 919 251 491 880 96;
  • 118) 0.954 237 558 956 919 251 491 880 96 × 2 = 1 + 0.908 475 117 913 838 502 983 761 92;
  • 119) 0.908 475 117 913 838 502 983 761 92 × 2 = 1 + 0.816 950 235 827 677 005 967 523 84;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 536 68(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 0011 0110 0111 1110 0101 1111 0100 1101 0001 011(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 536 68(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 0011 0110 0111 1110 0101 1111 0100 1101 0001 011(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 536 68(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 0011 0110 0111 1110 0101 1111 0100 1101 0001 011(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 0011 0110 0111 1110 0101 1111 0100 1101 0001 011(2) × 20 =


1.0100 0010 1000 0001 1011 0011 1111 0010 1111 1010 0110 1000 1011(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 1000 0001 1011 0011 1111 0010 1111 1010 0110 1000 1011


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 1000 0001 1011 0011 1111 0010 1111 1010 0110 1000 1011 =


0100 0010 1000 0001 1011 0011 1111 0010 1111 1010 0110 1000 1011


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 1000 0001 1011 0011 1111 0010 1111 1010 0110 1000 1011


Decimal number 0.000 000 000 000 000 000 008 536 68 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 1000 0001 1011 0011 1111 0010 1111 1010 0110 1000 1011


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100