0.000 000 000 000 000 000 008 565 7 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 565 7(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 565 7(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 565 7.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 565 7 × 2 = 0 + 0.000 000 000 000 000 000 017 131 4;
  • 2) 0.000 000 000 000 000 000 017 131 4 × 2 = 0 + 0.000 000 000 000 000 000 034 262 8;
  • 3) 0.000 000 000 000 000 000 034 262 8 × 2 = 0 + 0.000 000 000 000 000 000 068 525 6;
  • 4) 0.000 000 000 000 000 000 068 525 6 × 2 = 0 + 0.000 000 000 000 000 000 137 051 2;
  • 5) 0.000 000 000 000 000 000 137 051 2 × 2 = 0 + 0.000 000 000 000 000 000 274 102 4;
  • 6) 0.000 000 000 000 000 000 274 102 4 × 2 = 0 + 0.000 000 000 000 000 000 548 204 8;
  • 7) 0.000 000 000 000 000 000 548 204 8 × 2 = 0 + 0.000 000 000 000 000 001 096 409 6;
  • 8) 0.000 000 000 000 000 001 096 409 6 × 2 = 0 + 0.000 000 000 000 000 002 192 819 2;
  • 9) 0.000 000 000 000 000 002 192 819 2 × 2 = 0 + 0.000 000 000 000 000 004 385 638 4;
  • 10) 0.000 000 000 000 000 004 385 638 4 × 2 = 0 + 0.000 000 000 000 000 008 771 276 8;
  • 11) 0.000 000 000 000 000 008 771 276 8 × 2 = 0 + 0.000 000 000 000 000 017 542 553 6;
  • 12) 0.000 000 000 000 000 017 542 553 6 × 2 = 0 + 0.000 000 000 000 000 035 085 107 2;
  • 13) 0.000 000 000 000 000 035 085 107 2 × 2 = 0 + 0.000 000 000 000 000 070 170 214 4;
  • 14) 0.000 000 000 000 000 070 170 214 4 × 2 = 0 + 0.000 000 000 000 000 140 340 428 8;
  • 15) 0.000 000 000 000 000 140 340 428 8 × 2 = 0 + 0.000 000 000 000 000 280 680 857 6;
  • 16) 0.000 000 000 000 000 280 680 857 6 × 2 = 0 + 0.000 000 000 000 000 561 361 715 2;
  • 17) 0.000 000 000 000 000 561 361 715 2 × 2 = 0 + 0.000 000 000 000 001 122 723 430 4;
  • 18) 0.000 000 000 000 001 122 723 430 4 × 2 = 0 + 0.000 000 000 000 002 245 446 860 8;
  • 19) 0.000 000 000 000 002 245 446 860 8 × 2 = 0 + 0.000 000 000 000 004 490 893 721 6;
  • 20) 0.000 000 000 000 004 490 893 721 6 × 2 = 0 + 0.000 000 000 000 008 981 787 443 2;
  • 21) 0.000 000 000 000 008 981 787 443 2 × 2 = 0 + 0.000 000 000 000 017 963 574 886 4;
  • 22) 0.000 000 000 000 017 963 574 886 4 × 2 = 0 + 0.000 000 000 000 035 927 149 772 8;
  • 23) 0.000 000 000 000 035 927 149 772 8 × 2 = 0 + 0.000 000 000 000 071 854 299 545 6;
  • 24) 0.000 000 000 000 071 854 299 545 6 × 2 = 0 + 0.000 000 000 000 143 708 599 091 2;
  • 25) 0.000 000 000 000 143 708 599 091 2 × 2 = 0 + 0.000 000 000 000 287 417 198 182 4;
  • 26) 0.000 000 000 000 287 417 198 182 4 × 2 = 0 + 0.000 000 000 000 574 834 396 364 8;
  • 27) 0.000 000 000 000 574 834 396 364 8 × 2 = 0 + 0.000 000 000 001 149 668 792 729 6;
  • 28) 0.000 000 000 001 149 668 792 729 6 × 2 = 0 + 0.000 000 000 002 299 337 585 459 2;
  • 29) 0.000 000 000 002 299 337 585 459 2 × 2 = 0 + 0.000 000 000 004 598 675 170 918 4;
  • 30) 0.000 000 000 004 598 675 170 918 4 × 2 = 0 + 0.000 000 000 009 197 350 341 836 8;
  • 31) 0.000 000 000 009 197 350 341 836 8 × 2 = 0 + 0.000 000 000 018 394 700 683 673 6;
  • 32) 0.000 000 000 018 394 700 683 673 6 × 2 = 0 + 0.000 000 000 036 789 401 367 347 2;
  • 33) 0.000 000 000 036 789 401 367 347 2 × 2 = 0 + 0.000 000 000 073 578 802 734 694 4;
  • 34) 0.000 000 000 073 578 802 734 694 4 × 2 = 0 + 0.000 000 000 147 157 605 469 388 8;
  • 35) 0.000 000 000 147 157 605 469 388 8 × 2 = 0 + 0.000 000 000 294 315 210 938 777 6;
  • 36) 0.000 000 000 294 315 210 938 777 6 × 2 = 0 + 0.000 000 000 588 630 421 877 555 2;
  • 37) 0.000 000 000 588 630 421 877 555 2 × 2 = 0 + 0.000 000 001 177 260 843 755 110 4;
  • 38) 0.000 000 001 177 260 843 755 110 4 × 2 = 0 + 0.000 000 002 354 521 687 510 220 8;
  • 39) 0.000 000 002 354 521 687 510 220 8 × 2 = 0 + 0.000 000 004 709 043 375 020 441 6;
  • 40) 0.000 000 004 709 043 375 020 441 6 × 2 = 0 + 0.000 000 009 418 086 750 040 883 2;
  • 41) 0.000 000 009 418 086 750 040 883 2 × 2 = 0 + 0.000 000 018 836 173 500 081 766 4;
  • 42) 0.000 000 018 836 173 500 081 766 4 × 2 = 0 + 0.000 000 037 672 347 000 163 532 8;
  • 43) 0.000 000 037 672 347 000 163 532 8 × 2 = 0 + 0.000 000 075 344 694 000 327 065 6;
  • 44) 0.000 000 075 344 694 000 327 065 6 × 2 = 0 + 0.000 000 150 689 388 000 654 131 2;
  • 45) 0.000 000 150 689 388 000 654 131 2 × 2 = 0 + 0.000 000 301 378 776 001 308 262 4;
  • 46) 0.000 000 301 378 776 001 308 262 4 × 2 = 0 + 0.000 000 602 757 552 002 616 524 8;
  • 47) 0.000 000 602 757 552 002 616 524 8 × 2 = 0 + 0.000 001 205 515 104 005 233 049 6;
  • 48) 0.000 001 205 515 104 005 233 049 6 × 2 = 0 + 0.000 002 411 030 208 010 466 099 2;
  • 49) 0.000 002 411 030 208 010 466 099 2 × 2 = 0 + 0.000 004 822 060 416 020 932 198 4;
  • 50) 0.000 004 822 060 416 020 932 198 4 × 2 = 0 + 0.000 009 644 120 832 041 864 396 8;
  • 51) 0.000 009 644 120 832 041 864 396 8 × 2 = 0 + 0.000 019 288 241 664 083 728 793 6;
  • 52) 0.000 019 288 241 664 083 728 793 6 × 2 = 0 + 0.000 038 576 483 328 167 457 587 2;
  • 53) 0.000 038 576 483 328 167 457 587 2 × 2 = 0 + 0.000 077 152 966 656 334 915 174 4;
  • 54) 0.000 077 152 966 656 334 915 174 4 × 2 = 0 + 0.000 154 305 933 312 669 830 348 8;
  • 55) 0.000 154 305 933 312 669 830 348 8 × 2 = 0 + 0.000 308 611 866 625 339 660 697 6;
  • 56) 0.000 308 611 866 625 339 660 697 6 × 2 = 0 + 0.000 617 223 733 250 679 321 395 2;
  • 57) 0.000 617 223 733 250 679 321 395 2 × 2 = 0 + 0.001 234 447 466 501 358 642 790 4;
  • 58) 0.001 234 447 466 501 358 642 790 4 × 2 = 0 + 0.002 468 894 933 002 717 285 580 8;
  • 59) 0.002 468 894 933 002 717 285 580 8 × 2 = 0 + 0.004 937 789 866 005 434 571 161 6;
  • 60) 0.004 937 789 866 005 434 571 161 6 × 2 = 0 + 0.009 875 579 732 010 869 142 323 2;
  • 61) 0.009 875 579 732 010 869 142 323 2 × 2 = 0 + 0.019 751 159 464 021 738 284 646 4;
  • 62) 0.019 751 159 464 021 738 284 646 4 × 2 = 0 + 0.039 502 318 928 043 476 569 292 8;
  • 63) 0.039 502 318 928 043 476 569 292 8 × 2 = 0 + 0.079 004 637 856 086 953 138 585 6;
  • 64) 0.079 004 637 856 086 953 138 585 6 × 2 = 0 + 0.158 009 275 712 173 906 277 171 2;
  • 65) 0.158 009 275 712 173 906 277 171 2 × 2 = 0 + 0.316 018 551 424 347 812 554 342 4;
  • 66) 0.316 018 551 424 347 812 554 342 4 × 2 = 0 + 0.632 037 102 848 695 625 108 684 8;
  • 67) 0.632 037 102 848 695 625 108 684 8 × 2 = 1 + 0.264 074 205 697 391 250 217 369 6;
  • 68) 0.264 074 205 697 391 250 217 369 6 × 2 = 0 + 0.528 148 411 394 782 500 434 739 2;
  • 69) 0.528 148 411 394 782 500 434 739 2 × 2 = 1 + 0.056 296 822 789 565 000 869 478 4;
  • 70) 0.056 296 822 789 565 000 869 478 4 × 2 = 0 + 0.112 593 645 579 130 001 738 956 8;
  • 71) 0.112 593 645 579 130 001 738 956 8 × 2 = 0 + 0.225 187 291 158 260 003 477 913 6;
  • 72) 0.225 187 291 158 260 003 477 913 6 × 2 = 0 + 0.450 374 582 316 520 006 955 827 2;
  • 73) 0.450 374 582 316 520 006 955 827 2 × 2 = 0 + 0.900 749 164 633 040 013 911 654 4;
  • 74) 0.900 749 164 633 040 013 911 654 4 × 2 = 1 + 0.801 498 329 266 080 027 823 308 8;
  • 75) 0.801 498 329 266 080 027 823 308 8 × 2 = 1 + 0.602 996 658 532 160 055 646 617 6;
  • 76) 0.602 996 658 532 160 055 646 617 6 × 2 = 1 + 0.205 993 317 064 320 111 293 235 2;
  • 77) 0.205 993 317 064 320 111 293 235 2 × 2 = 0 + 0.411 986 634 128 640 222 586 470 4;
  • 78) 0.411 986 634 128 640 222 586 470 4 × 2 = 0 + 0.823 973 268 257 280 445 172 940 8;
  • 79) 0.823 973 268 257 280 445 172 940 8 × 2 = 1 + 0.647 946 536 514 560 890 345 881 6;
  • 80) 0.647 946 536 514 560 890 345 881 6 × 2 = 1 + 0.295 893 073 029 121 780 691 763 2;
  • 81) 0.295 893 073 029 121 780 691 763 2 × 2 = 0 + 0.591 786 146 058 243 561 383 526 4;
  • 82) 0.591 786 146 058 243 561 383 526 4 × 2 = 1 + 0.183 572 292 116 487 122 767 052 8;
  • 83) 0.183 572 292 116 487 122 767 052 8 × 2 = 0 + 0.367 144 584 232 974 245 534 105 6;
  • 84) 0.367 144 584 232 974 245 534 105 6 × 2 = 0 + 0.734 289 168 465 948 491 068 211 2;
  • 85) 0.734 289 168 465 948 491 068 211 2 × 2 = 1 + 0.468 578 336 931 896 982 136 422 4;
  • 86) 0.468 578 336 931 896 982 136 422 4 × 2 = 0 + 0.937 156 673 863 793 964 272 844 8;
  • 87) 0.937 156 673 863 793 964 272 844 8 × 2 = 1 + 0.874 313 347 727 587 928 545 689 6;
  • 88) 0.874 313 347 727 587 928 545 689 6 × 2 = 1 + 0.748 626 695 455 175 857 091 379 2;
  • 89) 0.748 626 695 455 175 857 091 379 2 × 2 = 1 + 0.497 253 390 910 351 714 182 758 4;
  • 90) 0.497 253 390 910 351 714 182 758 4 × 2 = 0 + 0.994 506 781 820 703 428 365 516 8;
  • 91) 0.994 506 781 820 703 428 365 516 8 × 2 = 1 + 0.989 013 563 641 406 856 731 033 6;
  • 92) 0.989 013 563 641 406 856 731 033 6 × 2 = 1 + 0.978 027 127 282 813 713 462 067 2;
  • 93) 0.978 027 127 282 813 713 462 067 2 × 2 = 1 + 0.956 054 254 565 627 426 924 134 4;
  • 94) 0.956 054 254 565 627 426 924 134 4 × 2 = 1 + 0.912 108 509 131 254 853 848 268 8;
  • 95) 0.912 108 509 131 254 853 848 268 8 × 2 = 1 + 0.824 217 018 262 509 707 696 537 6;
  • 96) 0.824 217 018 262 509 707 696 537 6 × 2 = 1 + 0.648 434 036 525 019 415 393 075 2;
  • 97) 0.648 434 036 525 019 415 393 075 2 × 2 = 1 + 0.296 868 073 050 038 830 786 150 4;
  • 98) 0.296 868 073 050 038 830 786 150 4 × 2 = 0 + 0.593 736 146 100 077 661 572 300 8;
  • 99) 0.593 736 146 100 077 661 572 300 8 × 2 = 1 + 0.187 472 292 200 155 323 144 601 6;
  • 100) 0.187 472 292 200 155 323 144 601 6 × 2 = 0 + 0.374 944 584 400 310 646 289 203 2;
  • 101) 0.374 944 584 400 310 646 289 203 2 × 2 = 0 + 0.749 889 168 800 621 292 578 406 4;
  • 102) 0.749 889 168 800 621 292 578 406 4 × 2 = 1 + 0.499 778 337 601 242 585 156 812 8;
  • 103) 0.499 778 337 601 242 585 156 812 8 × 2 = 0 + 0.999 556 675 202 485 170 313 625 6;
  • 104) 0.999 556 675 202 485 170 313 625 6 × 2 = 1 + 0.999 113 350 404 970 340 627 251 2;
  • 105) 0.999 113 350 404 970 340 627 251 2 × 2 = 1 + 0.998 226 700 809 940 681 254 502 4;
  • 106) 0.998 226 700 809 940 681 254 502 4 × 2 = 1 + 0.996 453 401 619 881 362 509 004 8;
  • 107) 0.996 453 401 619 881 362 509 004 8 × 2 = 1 + 0.992 906 803 239 762 725 018 009 6;
  • 108) 0.992 906 803 239 762 725 018 009 6 × 2 = 1 + 0.985 813 606 479 525 450 036 019 2;
  • 109) 0.985 813 606 479 525 450 036 019 2 × 2 = 1 + 0.971 627 212 959 050 900 072 038 4;
  • 110) 0.971 627 212 959 050 900 072 038 4 × 2 = 1 + 0.943 254 425 918 101 800 144 076 8;
  • 111) 0.943 254 425 918 101 800 144 076 8 × 2 = 1 + 0.886 508 851 836 203 600 288 153 6;
  • 112) 0.886 508 851 836 203 600 288 153 6 × 2 = 1 + 0.773 017 703 672 407 200 576 307 2;
  • 113) 0.773 017 703 672 407 200 576 307 2 × 2 = 1 + 0.546 035 407 344 814 401 152 614 4;
  • 114) 0.546 035 407 344 814 401 152 614 4 × 2 = 1 + 0.092 070 814 689 628 802 305 228 8;
  • 115) 0.092 070 814 689 628 802 305 228 8 × 2 = 0 + 0.184 141 629 379 257 604 610 457 6;
  • 116) 0.184 141 629 379 257 604 610 457 6 × 2 = 0 + 0.368 283 258 758 515 209 220 915 2;
  • 117) 0.368 283 258 758 515 209 220 915 2 × 2 = 0 + 0.736 566 517 517 030 418 441 830 4;
  • 118) 0.736 566 517 517 030 418 441 830 4 × 2 = 1 + 0.473 133 035 034 060 836 883 660 8;
  • 119) 0.473 133 035 034 060 836 883 660 8 × 2 = 0 + 0.946 266 070 068 121 673 767 321 6;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 565 7(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0111 0011 0100 1011 1011 1111 1010 0101 1111 1111 1100 010(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 565 7(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0111 0011 0100 1011 1011 1111 1010 0101 1111 1111 1100 010(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 565 7(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0111 0011 0100 1011 1011 1111 1010 0101 1111 1111 1100 010(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0111 0011 0100 1011 1011 1111 1010 0101 1111 1111 1100 010(2) × 20 =


1.0100 0011 1001 1010 0101 1101 1111 1101 0010 1111 1111 1110 0010(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0011 1001 1010 0101 1101 1111 1101 0010 1111 1111 1110 0010


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0011 1001 1010 0101 1101 1111 1101 0010 1111 1111 1110 0010 =


0100 0011 1001 1010 0101 1101 1111 1101 0010 1111 1111 1110 0010


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0011 1001 1010 0101 1101 1111 1101 0010 1111 1111 1110 0010


Decimal number 0.000 000 000 000 000 000 008 565 7 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0011 1001 1010 0101 1101 1111 1101 0010 1111 1111 1110 0010


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100