0.000 000 000 000 000 000 008 537 206 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 537 206(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 537 206(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 537 206.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 537 206 × 2 = 0 + 0.000 000 000 000 000 000 017 074 412;
  • 2) 0.000 000 000 000 000 000 017 074 412 × 2 = 0 + 0.000 000 000 000 000 000 034 148 824;
  • 3) 0.000 000 000 000 000 000 034 148 824 × 2 = 0 + 0.000 000 000 000 000 000 068 297 648;
  • 4) 0.000 000 000 000 000 000 068 297 648 × 2 = 0 + 0.000 000 000 000 000 000 136 595 296;
  • 5) 0.000 000 000 000 000 000 136 595 296 × 2 = 0 + 0.000 000 000 000 000 000 273 190 592;
  • 6) 0.000 000 000 000 000 000 273 190 592 × 2 = 0 + 0.000 000 000 000 000 000 546 381 184;
  • 7) 0.000 000 000 000 000 000 546 381 184 × 2 = 0 + 0.000 000 000 000 000 001 092 762 368;
  • 8) 0.000 000 000 000 000 001 092 762 368 × 2 = 0 + 0.000 000 000 000 000 002 185 524 736;
  • 9) 0.000 000 000 000 000 002 185 524 736 × 2 = 0 + 0.000 000 000 000 000 004 371 049 472;
  • 10) 0.000 000 000 000 000 004 371 049 472 × 2 = 0 + 0.000 000 000 000 000 008 742 098 944;
  • 11) 0.000 000 000 000 000 008 742 098 944 × 2 = 0 + 0.000 000 000 000 000 017 484 197 888;
  • 12) 0.000 000 000 000 000 017 484 197 888 × 2 = 0 + 0.000 000 000 000 000 034 968 395 776;
  • 13) 0.000 000 000 000 000 034 968 395 776 × 2 = 0 + 0.000 000 000 000 000 069 936 791 552;
  • 14) 0.000 000 000 000 000 069 936 791 552 × 2 = 0 + 0.000 000 000 000 000 139 873 583 104;
  • 15) 0.000 000 000 000 000 139 873 583 104 × 2 = 0 + 0.000 000 000 000 000 279 747 166 208;
  • 16) 0.000 000 000 000 000 279 747 166 208 × 2 = 0 + 0.000 000 000 000 000 559 494 332 416;
  • 17) 0.000 000 000 000 000 559 494 332 416 × 2 = 0 + 0.000 000 000 000 001 118 988 664 832;
  • 18) 0.000 000 000 000 001 118 988 664 832 × 2 = 0 + 0.000 000 000 000 002 237 977 329 664;
  • 19) 0.000 000 000 000 002 237 977 329 664 × 2 = 0 + 0.000 000 000 000 004 475 954 659 328;
  • 20) 0.000 000 000 000 004 475 954 659 328 × 2 = 0 + 0.000 000 000 000 008 951 909 318 656;
  • 21) 0.000 000 000 000 008 951 909 318 656 × 2 = 0 + 0.000 000 000 000 017 903 818 637 312;
  • 22) 0.000 000 000 000 017 903 818 637 312 × 2 = 0 + 0.000 000 000 000 035 807 637 274 624;
  • 23) 0.000 000 000 000 035 807 637 274 624 × 2 = 0 + 0.000 000 000 000 071 615 274 549 248;
  • 24) 0.000 000 000 000 071 615 274 549 248 × 2 = 0 + 0.000 000 000 000 143 230 549 098 496;
  • 25) 0.000 000 000 000 143 230 549 098 496 × 2 = 0 + 0.000 000 000 000 286 461 098 196 992;
  • 26) 0.000 000 000 000 286 461 098 196 992 × 2 = 0 + 0.000 000 000 000 572 922 196 393 984;
  • 27) 0.000 000 000 000 572 922 196 393 984 × 2 = 0 + 0.000 000 000 001 145 844 392 787 968;
  • 28) 0.000 000 000 001 145 844 392 787 968 × 2 = 0 + 0.000 000 000 002 291 688 785 575 936;
  • 29) 0.000 000 000 002 291 688 785 575 936 × 2 = 0 + 0.000 000 000 004 583 377 571 151 872;
  • 30) 0.000 000 000 004 583 377 571 151 872 × 2 = 0 + 0.000 000 000 009 166 755 142 303 744;
  • 31) 0.000 000 000 009 166 755 142 303 744 × 2 = 0 + 0.000 000 000 018 333 510 284 607 488;
  • 32) 0.000 000 000 018 333 510 284 607 488 × 2 = 0 + 0.000 000 000 036 667 020 569 214 976;
  • 33) 0.000 000 000 036 667 020 569 214 976 × 2 = 0 + 0.000 000 000 073 334 041 138 429 952;
  • 34) 0.000 000 000 073 334 041 138 429 952 × 2 = 0 + 0.000 000 000 146 668 082 276 859 904;
  • 35) 0.000 000 000 146 668 082 276 859 904 × 2 = 0 + 0.000 000 000 293 336 164 553 719 808;
  • 36) 0.000 000 000 293 336 164 553 719 808 × 2 = 0 + 0.000 000 000 586 672 329 107 439 616;
  • 37) 0.000 000 000 586 672 329 107 439 616 × 2 = 0 + 0.000 000 001 173 344 658 214 879 232;
  • 38) 0.000 000 001 173 344 658 214 879 232 × 2 = 0 + 0.000 000 002 346 689 316 429 758 464;
  • 39) 0.000 000 002 346 689 316 429 758 464 × 2 = 0 + 0.000 000 004 693 378 632 859 516 928;
  • 40) 0.000 000 004 693 378 632 859 516 928 × 2 = 0 + 0.000 000 009 386 757 265 719 033 856;
  • 41) 0.000 000 009 386 757 265 719 033 856 × 2 = 0 + 0.000 000 018 773 514 531 438 067 712;
  • 42) 0.000 000 018 773 514 531 438 067 712 × 2 = 0 + 0.000 000 037 547 029 062 876 135 424;
  • 43) 0.000 000 037 547 029 062 876 135 424 × 2 = 0 + 0.000 000 075 094 058 125 752 270 848;
  • 44) 0.000 000 075 094 058 125 752 270 848 × 2 = 0 + 0.000 000 150 188 116 251 504 541 696;
  • 45) 0.000 000 150 188 116 251 504 541 696 × 2 = 0 + 0.000 000 300 376 232 503 009 083 392;
  • 46) 0.000 000 300 376 232 503 009 083 392 × 2 = 0 + 0.000 000 600 752 465 006 018 166 784;
  • 47) 0.000 000 600 752 465 006 018 166 784 × 2 = 0 + 0.000 001 201 504 930 012 036 333 568;
  • 48) 0.000 001 201 504 930 012 036 333 568 × 2 = 0 + 0.000 002 403 009 860 024 072 667 136;
  • 49) 0.000 002 403 009 860 024 072 667 136 × 2 = 0 + 0.000 004 806 019 720 048 145 334 272;
  • 50) 0.000 004 806 019 720 048 145 334 272 × 2 = 0 + 0.000 009 612 039 440 096 290 668 544;
  • 51) 0.000 009 612 039 440 096 290 668 544 × 2 = 0 + 0.000 019 224 078 880 192 581 337 088;
  • 52) 0.000 019 224 078 880 192 581 337 088 × 2 = 0 + 0.000 038 448 157 760 385 162 674 176;
  • 53) 0.000 038 448 157 760 385 162 674 176 × 2 = 0 + 0.000 076 896 315 520 770 325 348 352;
  • 54) 0.000 076 896 315 520 770 325 348 352 × 2 = 0 + 0.000 153 792 631 041 540 650 696 704;
  • 55) 0.000 153 792 631 041 540 650 696 704 × 2 = 0 + 0.000 307 585 262 083 081 301 393 408;
  • 56) 0.000 307 585 262 083 081 301 393 408 × 2 = 0 + 0.000 615 170 524 166 162 602 786 816;
  • 57) 0.000 615 170 524 166 162 602 786 816 × 2 = 0 + 0.001 230 341 048 332 325 205 573 632;
  • 58) 0.001 230 341 048 332 325 205 573 632 × 2 = 0 + 0.002 460 682 096 664 650 411 147 264;
  • 59) 0.002 460 682 096 664 650 411 147 264 × 2 = 0 + 0.004 921 364 193 329 300 822 294 528;
  • 60) 0.004 921 364 193 329 300 822 294 528 × 2 = 0 + 0.009 842 728 386 658 601 644 589 056;
  • 61) 0.009 842 728 386 658 601 644 589 056 × 2 = 0 + 0.019 685 456 773 317 203 289 178 112;
  • 62) 0.019 685 456 773 317 203 289 178 112 × 2 = 0 + 0.039 370 913 546 634 406 578 356 224;
  • 63) 0.039 370 913 546 634 406 578 356 224 × 2 = 0 + 0.078 741 827 093 268 813 156 712 448;
  • 64) 0.078 741 827 093 268 813 156 712 448 × 2 = 0 + 0.157 483 654 186 537 626 313 424 896;
  • 65) 0.157 483 654 186 537 626 313 424 896 × 2 = 0 + 0.314 967 308 373 075 252 626 849 792;
  • 66) 0.314 967 308 373 075 252 626 849 792 × 2 = 0 + 0.629 934 616 746 150 505 253 699 584;
  • 67) 0.629 934 616 746 150 505 253 699 584 × 2 = 1 + 0.259 869 233 492 301 010 507 399 168;
  • 68) 0.259 869 233 492 301 010 507 399 168 × 2 = 0 + 0.519 738 466 984 602 021 014 798 336;
  • 69) 0.519 738 466 984 602 021 014 798 336 × 2 = 1 + 0.039 476 933 969 204 042 029 596 672;
  • 70) 0.039 476 933 969 204 042 029 596 672 × 2 = 0 + 0.078 953 867 938 408 084 059 193 344;
  • 71) 0.078 953 867 938 408 084 059 193 344 × 2 = 0 + 0.157 907 735 876 816 168 118 386 688;
  • 72) 0.157 907 735 876 816 168 118 386 688 × 2 = 0 + 0.315 815 471 753 632 336 236 773 376;
  • 73) 0.315 815 471 753 632 336 236 773 376 × 2 = 0 + 0.631 630 943 507 264 672 473 546 752;
  • 74) 0.631 630 943 507 264 672 473 546 752 × 2 = 1 + 0.263 261 887 014 529 344 947 093 504;
  • 75) 0.263 261 887 014 529 344 947 093 504 × 2 = 0 + 0.526 523 774 029 058 689 894 187 008;
  • 76) 0.526 523 774 029 058 689 894 187 008 × 2 = 1 + 0.053 047 548 058 117 379 788 374 016;
  • 77) 0.053 047 548 058 117 379 788 374 016 × 2 = 0 + 0.106 095 096 116 234 759 576 748 032;
  • 78) 0.106 095 096 116 234 759 576 748 032 × 2 = 0 + 0.212 190 192 232 469 519 153 496 064;
  • 79) 0.212 190 192 232 469 519 153 496 064 × 2 = 0 + 0.424 380 384 464 939 038 306 992 128;
  • 80) 0.424 380 384 464 939 038 306 992 128 × 2 = 0 + 0.848 760 768 929 878 076 613 984 256;
  • 81) 0.848 760 768 929 878 076 613 984 256 × 2 = 1 + 0.697 521 537 859 756 153 227 968 512;
  • 82) 0.697 521 537 859 756 153 227 968 512 × 2 = 1 + 0.395 043 075 719 512 306 455 937 024;
  • 83) 0.395 043 075 719 512 306 455 937 024 × 2 = 0 + 0.790 086 151 439 024 612 911 874 048;
  • 84) 0.790 086 151 439 024 612 911 874 048 × 2 = 1 + 0.580 172 302 878 049 225 823 748 096;
  • 85) 0.580 172 302 878 049 225 823 748 096 × 2 = 1 + 0.160 344 605 756 098 451 647 496 192;
  • 86) 0.160 344 605 756 098 451 647 496 192 × 2 = 0 + 0.320 689 211 512 196 903 294 992 384;
  • 87) 0.320 689 211 512 196 903 294 992 384 × 2 = 0 + 0.641 378 423 024 393 806 589 984 768;
  • 88) 0.641 378 423 024 393 806 589 984 768 × 2 = 1 + 0.282 756 846 048 787 613 179 969 536;
  • 89) 0.282 756 846 048 787 613 179 969 536 × 2 = 0 + 0.565 513 692 097 575 226 359 939 072;
  • 90) 0.565 513 692 097 575 226 359 939 072 × 2 = 1 + 0.131 027 384 195 150 452 719 878 144;
  • 91) 0.131 027 384 195 150 452 719 878 144 × 2 = 0 + 0.262 054 768 390 300 905 439 756 288;
  • 92) 0.262 054 768 390 300 905 439 756 288 × 2 = 0 + 0.524 109 536 780 601 810 879 512 576;
  • 93) 0.524 109 536 780 601 810 879 512 576 × 2 = 1 + 0.048 219 073 561 203 621 759 025 152;
  • 94) 0.048 219 073 561 203 621 759 025 152 × 2 = 0 + 0.096 438 147 122 407 243 518 050 304;
  • 95) 0.096 438 147 122 407 243 518 050 304 × 2 = 0 + 0.192 876 294 244 814 487 036 100 608;
  • 96) 0.192 876 294 244 814 487 036 100 608 × 2 = 0 + 0.385 752 588 489 628 974 072 201 216;
  • 97) 0.385 752 588 489 628 974 072 201 216 × 2 = 0 + 0.771 505 176 979 257 948 144 402 432;
  • 98) 0.771 505 176 979 257 948 144 402 432 × 2 = 1 + 0.543 010 353 958 515 896 288 804 864;
  • 99) 0.543 010 353 958 515 896 288 804 864 × 2 = 1 + 0.086 020 707 917 031 792 577 609 728;
  • 100) 0.086 020 707 917 031 792 577 609 728 × 2 = 0 + 0.172 041 415 834 063 585 155 219 456;
  • 101) 0.172 041 415 834 063 585 155 219 456 × 2 = 0 + 0.344 082 831 668 127 170 310 438 912;
  • 102) 0.344 082 831 668 127 170 310 438 912 × 2 = 0 + 0.688 165 663 336 254 340 620 877 824;
  • 103) 0.688 165 663 336 254 340 620 877 824 × 2 = 1 + 0.376 331 326 672 508 681 241 755 648;
  • 104) 0.376 331 326 672 508 681 241 755 648 × 2 = 0 + 0.752 662 653 345 017 362 483 511 296;
  • 105) 0.752 662 653 345 017 362 483 511 296 × 2 = 1 + 0.505 325 306 690 034 724 967 022 592;
  • 106) 0.505 325 306 690 034 724 967 022 592 × 2 = 1 + 0.010 650 613 380 069 449 934 045 184;
  • 107) 0.010 650 613 380 069 449 934 045 184 × 2 = 0 + 0.021 301 226 760 138 899 868 090 368;
  • 108) 0.021 301 226 760 138 899 868 090 368 × 2 = 0 + 0.042 602 453 520 277 799 736 180 736;
  • 109) 0.042 602 453 520 277 799 736 180 736 × 2 = 0 + 0.085 204 907 040 555 599 472 361 472;
  • 110) 0.085 204 907 040 555 599 472 361 472 × 2 = 0 + 0.170 409 814 081 111 198 944 722 944;
  • 111) 0.170 409 814 081 111 198 944 722 944 × 2 = 0 + 0.340 819 628 162 222 397 889 445 888;
  • 112) 0.340 819 628 162 222 397 889 445 888 × 2 = 0 + 0.681 639 256 324 444 795 778 891 776;
  • 113) 0.681 639 256 324 444 795 778 891 776 × 2 = 1 + 0.363 278 512 648 889 591 557 783 552;
  • 114) 0.363 278 512 648 889 591 557 783 552 × 2 = 0 + 0.726 557 025 297 779 183 115 567 104;
  • 115) 0.726 557 025 297 779 183 115 567 104 × 2 = 1 + 0.453 114 050 595 558 366 231 134 208;
  • 116) 0.453 114 050 595 558 366 231 134 208 × 2 = 0 + 0.906 228 101 191 116 732 462 268 416;
  • 117) 0.906 228 101 191 116 732 462 268 416 × 2 = 1 + 0.812 456 202 382 233 464 924 536 832;
  • 118) 0.812 456 202 382 233 464 924 536 832 × 2 = 1 + 0.624 912 404 764 466 929 849 073 664;
  • 119) 0.624 912 404 764 466 929 849 073 664 × 2 = 1 + 0.249 824 809 528 933 859 698 147 328;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 537 206(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1101 1001 0100 1000 0110 0010 1100 0000 1010 111(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 537 206(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1101 1001 0100 1000 0110 0010 1100 0000 1010 111(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 537 206(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1101 1001 0100 1000 0110 0010 1100 0000 1010 111(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1101 1001 0100 1000 0110 0010 1100 0000 1010 111(2) × 20 =


1.0100 0010 1000 0110 1100 1010 0100 0011 0001 0110 0000 0101 0111(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 1000 0110 1100 1010 0100 0011 0001 0110 0000 0101 0111


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 1000 0110 1100 1010 0100 0011 0001 0110 0000 0101 0111 =


0100 0010 1000 0110 1100 1010 0100 0011 0001 0110 0000 0101 0111


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 1000 0110 1100 1010 0100 0011 0001 0110 0000 0101 0111


Decimal number 0.000 000 000 000 000 000 008 537 206 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 1000 0110 1100 1010 0100 0011 0001 0110 0000 0101 0111


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100