0.000 000 000 000 000 000 008 558 3 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 558 3(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 558 3(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 558 3.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 558 3 × 2 = 0 + 0.000 000 000 000 000 000 017 116 6;
  • 2) 0.000 000 000 000 000 000 017 116 6 × 2 = 0 + 0.000 000 000 000 000 000 034 233 2;
  • 3) 0.000 000 000 000 000 000 034 233 2 × 2 = 0 + 0.000 000 000 000 000 000 068 466 4;
  • 4) 0.000 000 000 000 000 000 068 466 4 × 2 = 0 + 0.000 000 000 000 000 000 136 932 8;
  • 5) 0.000 000 000 000 000 000 136 932 8 × 2 = 0 + 0.000 000 000 000 000 000 273 865 6;
  • 6) 0.000 000 000 000 000 000 273 865 6 × 2 = 0 + 0.000 000 000 000 000 000 547 731 2;
  • 7) 0.000 000 000 000 000 000 547 731 2 × 2 = 0 + 0.000 000 000 000 000 001 095 462 4;
  • 8) 0.000 000 000 000 000 001 095 462 4 × 2 = 0 + 0.000 000 000 000 000 002 190 924 8;
  • 9) 0.000 000 000 000 000 002 190 924 8 × 2 = 0 + 0.000 000 000 000 000 004 381 849 6;
  • 10) 0.000 000 000 000 000 004 381 849 6 × 2 = 0 + 0.000 000 000 000 000 008 763 699 2;
  • 11) 0.000 000 000 000 000 008 763 699 2 × 2 = 0 + 0.000 000 000 000 000 017 527 398 4;
  • 12) 0.000 000 000 000 000 017 527 398 4 × 2 = 0 + 0.000 000 000 000 000 035 054 796 8;
  • 13) 0.000 000 000 000 000 035 054 796 8 × 2 = 0 + 0.000 000 000 000 000 070 109 593 6;
  • 14) 0.000 000 000 000 000 070 109 593 6 × 2 = 0 + 0.000 000 000 000 000 140 219 187 2;
  • 15) 0.000 000 000 000 000 140 219 187 2 × 2 = 0 + 0.000 000 000 000 000 280 438 374 4;
  • 16) 0.000 000 000 000 000 280 438 374 4 × 2 = 0 + 0.000 000 000 000 000 560 876 748 8;
  • 17) 0.000 000 000 000 000 560 876 748 8 × 2 = 0 + 0.000 000 000 000 001 121 753 497 6;
  • 18) 0.000 000 000 000 001 121 753 497 6 × 2 = 0 + 0.000 000 000 000 002 243 506 995 2;
  • 19) 0.000 000 000 000 002 243 506 995 2 × 2 = 0 + 0.000 000 000 000 004 487 013 990 4;
  • 20) 0.000 000 000 000 004 487 013 990 4 × 2 = 0 + 0.000 000 000 000 008 974 027 980 8;
  • 21) 0.000 000 000 000 008 974 027 980 8 × 2 = 0 + 0.000 000 000 000 017 948 055 961 6;
  • 22) 0.000 000 000 000 017 948 055 961 6 × 2 = 0 + 0.000 000 000 000 035 896 111 923 2;
  • 23) 0.000 000 000 000 035 896 111 923 2 × 2 = 0 + 0.000 000 000 000 071 792 223 846 4;
  • 24) 0.000 000 000 000 071 792 223 846 4 × 2 = 0 + 0.000 000 000 000 143 584 447 692 8;
  • 25) 0.000 000 000 000 143 584 447 692 8 × 2 = 0 + 0.000 000 000 000 287 168 895 385 6;
  • 26) 0.000 000 000 000 287 168 895 385 6 × 2 = 0 + 0.000 000 000 000 574 337 790 771 2;
  • 27) 0.000 000 000 000 574 337 790 771 2 × 2 = 0 + 0.000 000 000 001 148 675 581 542 4;
  • 28) 0.000 000 000 001 148 675 581 542 4 × 2 = 0 + 0.000 000 000 002 297 351 163 084 8;
  • 29) 0.000 000 000 002 297 351 163 084 8 × 2 = 0 + 0.000 000 000 004 594 702 326 169 6;
  • 30) 0.000 000 000 004 594 702 326 169 6 × 2 = 0 + 0.000 000 000 009 189 404 652 339 2;
  • 31) 0.000 000 000 009 189 404 652 339 2 × 2 = 0 + 0.000 000 000 018 378 809 304 678 4;
  • 32) 0.000 000 000 018 378 809 304 678 4 × 2 = 0 + 0.000 000 000 036 757 618 609 356 8;
  • 33) 0.000 000 000 036 757 618 609 356 8 × 2 = 0 + 0.000 000 000 073 515 237 218 713 6;
  • 34) 0.000 000 000 073 515 237 218 713 6 × 2 = 0 + 0.000 000 000 147 030 474 437 427 2;
  • 35) 0.000 000 000 147 030 474 437 427 2 × 2 = 0 + 0.000 000 000 294 060 948 874 854 4;
  • 36) 0.000 000 000 294 060 948 874 854 4 × 2 = 0 + 0.000 000 000 588 121 897 749 708 8;
  • 37) 0.000 000 000 588 121 897 749 708 8 × 2 = 0 + 0.000 000 001 176 243 795 499 417 6;
  • 38) 0.000 000 001 176 243 795 499 417 6 × 2 = 0 + 0.000 000 002 352 487 590 998 835 2;
  • 39) 0.000 000 002 352 487 590 998 835 2 × 2 = 0 + 0.000 000 004 704 975 181 997 670 4;
  • 40) 0.000 000 004 704 975 181 997 670 4 × 2 = 0 + 0.000 000 009 409 950 363 995 340 8;
  • 41) 0.000 000 009 409 950 363 995 340 8 × 2 = 0 + 0.000 000 018 819 900 727 990 681 6;
  • 42) 0.000 000 018 819 900 727 990 681 6 × 2 = 0 + 0.000 000 037 639 801 455 981 363 2;
  • 43) 0.000 000 037 639 801 455 981 363 2 × 2 = 0 + 0.000 000 075 279 602 911 962 726 4;
  • 44) 0.000 000 075 279 602 911 962 726 4 × 2 = 0 + 0.000 000 150 559 205 823 925 452 8;
  • 45) 0.000 000 150 559 205 823 925 452 8 × 2 = 0 + 0.000 000 301 118 411 647 850 905 6;
  • 46) 0.000 000 301 118 411 647 850 905 6 × 2 = 0 + 0.000 000 602 236 823 295 701 811 2;
  • 47) 0.000 000 602 236 823 295 701 811 2 × 2 = 0 + 0.000 001 204 473 646 591 403 622 4;
  • 48) 0.000 001 204 473 646 591 403 622 4 × 2 = 0 + 0.000 002 408 947 293 182 807 244 8;
  • 49) 0.000 002 408 947 293 182 807 244 8 × 2 = 0 + 0.000 004 817 894 586 365 614 489 6;
  • 50) 0.000 004 817 894 586 365 614 489 6 × 2 = 0 + 0.000 009 635 789 172 731 228 979 2;
  • 51) 0.000 009 635 789 172 731 228 979 2 × 2 = 0 + 0.000 019 271 578 345 462 457 958 4;
  • 52) 0.000 019 271 578 345 462 457 958 4 × 2 = 0 + 0.000 038 543 156 690 924 915 916 8;
  • 53) 0.000 038 543 156 690 924 915 916 8 × 2 = 0 + 0.000 077 086 313 381 849 831 833 6;
  • 54) 0.000 077 086 313 381 849 831 833 6 × 2 = 0 + 0.000 154 172 626 763 699 663 667 2;
  • 55) 0.000 154 172 626 763 699 663 667 2 × 2 = 0 + 0.000 308 345 253 527 399 327 334 4;
  • 56) 0.000 308 345 253 527 399 327 334 4 × 2 = 0 + 0.000 616 690 507 054 798 654 668 8;
  • 57) 0.000 616 690 507 054 798 654 668 8 × 2 = 0 + 0.001 233 381 014 109 597 309 337 6;
  • 58) 0.001 233 381 014 109 597 309 337 6 × 2 = 0 + 0.002 466 762 028 219 194 618 675 2;
  • 59) 0.002 466 762 028 219 194 618 675 2 × 2 = 0 + 0.004 933 524 056 438 389 237 350 4;
  • 60) 0.004 933 524 056 438 389 237 350 4 × 2 = 0 + 0.009 867 048 112 876 778 474 700 8;
  • 61) 0.009 867 048 112 876 778 474 700 8 × 2 = 0 + 0.019 734 096 225 753 556 949 401 6;
  • 62) 0.019 734 096 225 753 556 949 401 6 × 2 = 0 + 0.039 468 192 451 507 113 898 803 2;
  • 63) 0.039 468 192 451 507 113 898 803 2 × 2 = 0 + 0.078 936 384 903 014 227 797 606 4;
  • 64) 0.078 936 384 903 014 227 797 606 4 × 2 = 0 + 0.157 872 769 806 028 455 595 212 8;
  • 65) 0.157 872 769 806 028 455 595 212 8 × 2 = 0 + 0.315 745 539 612 056 911 190 425 6;
  • 66) 0.315 745 539 612 056 911 190 425 6 × 2 = 0 + 0.631 491 079 224 113 822 380 851 2;
  • 67) 0.631 491 079 224 113 822 380 851 2 × 2 = 1 + 0.262 982 158 448 227 644 761 702 4;
  • 68) 0.262 982 158 448 227 644 761 702 4 × 2 = 0 + 0.525 964 316 896 455 289 523 404 8;
  • 69) 0.525 964 316 896 455 289 523 404 8 × 2 = 1 + 0.051 928 633 792 910 579 046 809 6;
  • 70) 0.051 928 633 792 910 579 046 809 6 × 2 = 0 + 0.103 857 267 585 821 158 093 619 2;
  • 71) 0.103 857 267 585 821 158 093 619 2 × 2 = 0 + 0.207 714 535 171 642 316 187 238 4;
  • 72) 0.207 714 535 171 642 316 187 238 4 × 2 = 0 + 0.415 429 070 343 284 632 374 476 8;
  • 73) 0.415 429 070 343 284 632 374 476 8 × 2 = 0 + 0.830 858 140 686 569 264 748 953 6;
  • 74) 0.830 858 140 686 569 264 748 953 6 × 2 = 1 + 0.661 716 281 373 138 529 497 907 2;
  • 75) 0.661 716 281 373 138 529 497 907 2 × 2 = 1 + 0.323 432 562 746 277 058 995 814 4;
  • 76) 0.323 432 562 746 277 058 995 814 4 × 2 = 0 + 0.646 865 125 492 554 117 991 628 8;
  • 77) 0.646 865 125 492 554 117 991 628 8 × 2 = 1 + 0.293 730 250 985 108 235 983 257 6;
  • 78) 0.293 730 250 985 108 235 983 257 6 × 2 = 0 + 0.587 460 501 970 216 471 966 515 2;
  • 79) 0.587 460 501 970 216 471 966 515 2 × 2 = 1 + 0.174 921 003 940 432 943 933 030 4;
  • 80) 0.174 921 003 940 432 943 933 030 4 × 2 = 0 + 0.349 842 007 880 865 887 866 060 8;
  • 81) 0.349 842 007 880 865 887 866 060 8 × 2 = 0 + 0.699 684 015 761 731 775 732 121 6;
  • 82) 0.699 684 015 761 731 775 732 121 6 × 2 = 1 + 0.399 368 031 523 463 551 464 243 2;
  • 83) 0.399 368 031 523 463 551 464 243 2 × 2 = 0 + 0.798 736 063 046 927 102 928 486 4;
  • 84) 0.798 736 063 046 927 102 928 486 4 × 2 = 1 + 0.597 472 126 093 854 205 856 972 8;
  • 85) 0.597 472 126 093 854 205 856 972 8 × 2 = 1 + 0.194 944 252 187 708 411 713 945 6;
  • 86) 0.194 944 252 187 708 411 713 945 6 × 2 = 0 + 0.389 888 504 375 416 823 427 891 2;
  • 87) 0.389 888 504 375 416 823 427 891 2 × 2 = 0 + 0.779 777 008 750 833 646 855 782 4;
  • 88) 0.779 777 008 750 833 646 855 782 4 × 2 = 1 + 0.559 554 017 501 667 293 711 564 8;
  • 89) 0.559 554 017 501 667 293 711 564 8 × 2 = 1 + 0.119 108 035 003 334 587 423 129 6;
  • 90) 0.119 108 035 003 334 587 423 129 6 × 2 = 0 + 0.238 216 070 006 669 174 846 259 2;
  • 91) 0.238 216 070 006 669 174 846 259 2 × 2 = 0 + 0.476 432 140 013 338 349 692 518 4;
  • 92) 0.476 432 140 013 338 349 692 518 4 × 2 = 0 + 0.952 864 280 026 676 699 385 036 8;
  • 93) 0.952 864 280 026 676 699 385 036 8 × 2 = 1 + 0.905 728 560 053 353 398 770 073 6;
  • 94) 0.905 728 560 053 353 398 770 073 6 × 2 = 1 + 0.811 457 120 106 706 797 540 147 2;
  • 95) 0.811 457 120 106 706 797 540 147 2 × 2 = 1 + 0.622 914 240 213 413 595 080 294 4;
  • 96) 0.622 914 240 213 413 595 080 294 4 × 2 = 1 + 0.245 828 480 426 827 190 160 588 8;
  • 97) 0.245 828 480 426 827 190 160 588 8 × 2 = 0 + 0.491 656 960 853 654 380 321 177 6;
  • 98) 0.491 656 960 853 654 380 321 177 6 × 2 = 0 + 0.983 313 921 707 308 760 642 355 2;
  • 99) 0.983 313 921 707 308 760 642 355 2 × 2 = 1 + 0.966 627 843 414 617 521 284 710 4;
  • 100) 0.966 627 843 414 617 521 284 710 4 × 2 = 1 + 0.933 255 686 829 235 042 569 420 8;
  • 101) 0.933 255 686 829 235 042 569 420 8 × 2 = 1 + 0.866 511 373 658 470 085 138 841 6;
  • 102) 0.866 511 373 658 470 085 138 841 6 × 2 = 1 + 0.733 022 747 316 940 170 277 683 2;
  • 103) 0.733 022 747 316 940 170 277 683 2 × 2 = 1 + 0.466 045 494 633 880 340 555 366 4;
  • 104) 0.466 045 494 633 880 340 555 366 4 × 2 = 0 + 0.932 090 989 267 760 681 110 732 8;
  • 105) 0.932 090 989 267 760 681 110 732 8 × 2 = 1 + 0.864 181 978 535 521 362 221 465 6;
  • 106) 0.864 181 978 535 521 362 221 465 6 × 2 = 1 + 0.728 363 957 071 042 724 442 931 2;
  • 107) 0.728 363 957 071 042 724 442 931 2 × 2 = 1 + 0.456 727 914 142 085 448 885 862 4;
  • 108) 0.456 727 914 142 085 448 885 862 4 × 2 = 0 + 0.913 455 828 284 170 897 771 724 8;
  • 109) 0.913 455 828 284 170 897 771 724 8 × 2 = 1 + 0.826 911 656 568 341 795 543 449 6;
  • 110) 0.826 911 656 568 341 795 543 449 6 × 2 = 1 + 0.653 823 313 136 683 591 086 899 2;
  • 111) 0.653 823 313 136 683 591 086 899 2 × 2 = 1 + 0.307 646 626 273 367 182 173 798 4;
  • 112) 0.307 646 626 273 367 182 173 798 4 × 2 = 0 + 0.615 293 252 546 734 364 347 596 8;
  • 113) 0.615 293 252 546 734 364 347 596 8 × 2 = 1 + 0.230 586 505 093 468 728 695 193 6;
  • 114) 0.230 586 505 093 468 728 695 193 6 × 2 = 0 + 0.461 173 010 186 937 457 390 387 2;
  • 115) 0.461 173 010 186 937 457 390 387 2 × 2 = 0 + 0.922 346 020 373 874 914 780 774 4;
  • 116) 0.922 346 020 373 874 914 780 774 4 × 2 = 1 + 0.844 692 040 747 749 829 561 548 8;
  • 117) 0.844 692 040 747 749 829 561 548 8 × 2 = 1 + 0.689 384 081 495 499 659 123 097 6;
  • 118) 0.689 384 081 495 499 659 123 097 6 × 2 = 1 + 0.378 768 162 990 999 318 246 195 2;
  • 119) 0.378 768 162 990 999 318 246 195 2 × 2 = 0 + 0.757 536 325 981 998 636 492 390 4;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 558 3(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0110 1010 0101 1001 1000 1111 0011 1110 1110 1110 1001 110(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 558 3(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0110 1010 0101 1001 1000 1111 0011 1110 1110 1110 1001 110(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 558 3(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0110 1010 0101 1001 1000 1111 0011 1110 1110 1110 1001 110(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0110 1010 0101 1001 1000 1111 0011 1110 1110 1110 1001 110(2) × 20 =


1.0100 0011 0101 0010 1100 1100 0111 1001 1111 0111 0111 0100 1110(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0011 0101 0010 1100 1100 0111 1001 1111 0111 0111 0100 1110


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0011 0101 0010 1100 1100 0111 1001 1111 0111 0111 0100 1110 =


0100 0011 0101 0010 1100 1100 0111 1001 1111 0111 0111 0100 1110


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0011 0101 0010 1100 1100 0111 1001 1111 0111 0111 0100 1110


Decimal number 0.000 000 000 000 000 000 008 558 3 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0011 0101 0010 1100 1100 0111 1001 1111 0111 0111 0100 1110


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100