0.000 000 000 000 000 000 008 530 8 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 530 8(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 530 8(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 530 8.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 530 8 × 2 = 0 + 0.000 000 000 000 000 000 017 061 6;
  • 2) 0.000 000 000 000 000 000 017 061 6 × 2 = 0 + 0.000 000 000 000 000 000 034 123 2;
  • 3) 0.000 000 000 000 000 000 034 123 2 × 2 = 0 + 0.000 000 000 000 000 000 068 246 4;
  • 4) 0.000 000 000 000 000 000 068 246 4 × 2 = 0 + 0.000 000 000 000 000 000 136 492 8;
  • 5) 0.000 000 000 000 000 000 136 492 8 × 2 = 0 + 0.000 000 000 000 000 000 272 985 6;
  • 6) 0.000 000 000 000 000 000 272 985 6 × 2 = 0 + 0.000 000 000 000 000 000 545 971 2;
  • 7) 0.000 000 000 000 000 000 545 971 2 × 2 = 0 + 0.000 000 000 000 000 001 091 942 4;
  • 8) 0.000 000 000 000 000 001 091 942 4 × 2 = 0 + 0.000 000 000 000 000 002 183 884 8;
  • 9) 0.000 000 000 000 000 002 183 884 8 × 2 = 0 + 0.000 000 000 000 000 004 367 769 6;
  • 10) 0.000 000 000 000 000 004 367 769 6 × 2 = 0 + 0.000 000 000 000 000 008 735 539 2;
  • 11) 0.000 000 000 000 000 008 735 539 2 × 2 = 0 + 0.000 000 000 000 000 017 471 078 4;
  • 12) 0.000 000 000 000 000 017 471 078 4 × 2 = 0 + 0.000 000 000 000 000 034 942 156 8;
  • 13) 0.000 000 000 000 000 034 942 156 8 × 2 = 0 + 0.000 000 000 000 000 069 884 313 6;
  • 14) 0.000 000 000 000 000 069 884 313 6 × 2 = 0 + 0.000 000 000 000 000 139 768 627 2;
  • 15) 0.000 000 000 000 000 139 768 627 2 × 2 = 0 + 0.000 000 000 000 000 279 537 254 4;
  • 16) 0.000 000 000 000 000 279 537 254 4 × 2 = 0 + 0.000 000 000 000 000 559 074 508 8;
  • 17) 0.000 000 000 000 000 559 074 508 8 × 2 = 0 + 0.000 000 000 000 001 118 149 017 6;
  • 18) 0.000 000 000 000 001 118 149 017 6 × 2 = 0 + 0.000 000 000 000 002 236 298 035 2;
  • 19) 0.000 000 000 000 002 236 298 035 2 × 2 = 0 + 0.000 000 000 000 004 472 596 070 4;
  • 20) 0.000 000 000 000 004 472 596 070 4 × 2 = 0 + 0.000 000 000 000 008 945 192 140 8;
  • 21) 0.000 000 000 000 008 945 192 140 8 × 2 = 0 + 0.000 000 000 000 017 890 384 281 6;
  • 22) 0.000 000 000 000 017 890 384 281 6 × 2 = 0 + 0.000 000 000 000 035 780 768 563 2;
  • 23) 0.000 000 000 000 035 780 768 563 2 × 2 = 0 + 0.000 000 000 000 071 561 537 126 4;
  • 24) 0.000 000 000 000 071 561 537 126 4 × 2 = 0 + 0.000 000 000 000 143 123 074 252 8;
  • 25) 0.000 000 000 000 143 123 074 252 8 × 2 = 0 + 0.000 000 000 000 286 246 148 505 6;
  • 26) 0.000 000 000 000 286 246 148 505 6 × 2 = 0 + 0.000 000 000 000 572 492 297 011 2;
  • 27) 0.000 000 000 000 572 492 297 011 2 × 2 = 0 + 0.000 000 000 001 144 984 594 022 4;
  • 28) 0.000 000 000 001 144 984 594 022 4 × 2 = 0 + 0.000 000 000 002 289 969 188 044 8;
  • 29) 0.000 000 000 002 289 969 188 044 8 × 2 = 0 + 0.000 000 000 004 579 938 376 089 6;
  • 30) 0.000 000 000 004 579 938 376 089 6 × 2 = 0 + 0.000 000 000 009 159 876 752 179 2;
  • 31) 0.000 000 000 009 159 876 752 179 2 × 2 = 0 + 0.000 000 000 018 319 753 504 358 4;
  • 32) 0.000 000 000 018 319 753 504 358 4 × 2 = 0 + 0.000 000 000 036 639 507 008 716 8;
  • 33) 0.000 000 000 036 639 507 008 716 8 × 2 = 0 + 0.000 000 000 073 279 014 017 433 6;
  • 34) 0.000 000 000 073 279 014 017 433 6 × 2 = 0 + 0.000 000 000 146 558 028 034 867 2;
  • 35) 0.000 000 000 146 558 028 034 867 2 × 2 = 0 + 0.000 000 000 293 116 056 069 734 4;
  • 36) 0.000 000 000 293 116 056 069 734 4 × 2 = 0 + 0.000 000 000 586 232 112 139 468 8;
  • 37) 0.000 000 000 586 232 112 139 468 8 × 2 = 0 + 0.000 000 001 172 464 224 278 937 6;
  • 38) 0.000 000 001 172 464 224 278 937 6 × 2 = 0 + 0.000 000 002 344 928 448 557 875 2;
  • 39) 0.000 000 002 344 928 448 557 875 2 × 2 = 0 + 0.000 000 004 689 856 897 115 750 4;
  • 40) 0.000 000 004 689 856 897 115 750 4 × 2 = 0 + 0.000 000 009 379 713 794 231 500 8;
  • 41) 0.000 000 009 379 713 794 231 500 8 × 2 = 0 + 0.000 000 018 759 427 588 463 001 6;
  • 42) 0.000 000 018 759 427 588 463 001 6 × 2 = 0 + 0.000 000 037 518 855 176 926 003 2;
  • 43) 0.000 000 037 518 855 176 926 003 2 × 2 = 0 + 0.000 000 075 037 710 353 852 006 4;
  • 44) 0.000 000 075 037 710 353 852 006 4 × 2 = 0 + 0.000 000 150 075 420 707 704 012 8;
  • 45) 0.000 000 150 075 420 707 704 012 8 × 2 = 0 + 0.000 000 300 150 841 415 408 025 6;
  • 46) 0.000 000 300 150 841 415 408 025 6 × 2 = 0 + 0.000 000 600 301 682 830 816 051 2;
  • 47) 0.000 000 600 301 682 830 816 051 2 × 2 = 0 + 0.000 001 200 603 365 661 632 102 4;
  • 48) 0.000 001 200 603 365 661 632 102 4 × 2 = 0 + 0.000 002 401 206 731 323 264 204 8;
  • 49) 0.000 002 401 206 731 323 264 204 8 × 2 = 0 + 0.000 004 802 413 462 646 528 409 6;
  • 50) 0.000 004 802 413 462 646 528 409 6 × 2 = 0 + 0.000 009 604 826 925 293 056 819 2;
  • 51) 0.000 009 604 826 925 293 056 819 2 × 2 = 0 + 0.000 019 209 653 850 586 113 638 4;
  • 52) 0.000 019 209 653 850 586 113 638 4 × 2 = 0 + 0.000 038 419 307 701 172 227 276 8;
  • 53) 0.000 038 419 307 701 172 227 276 8 × 2 = 0 + 0.000 076 838 615 402 344 454 553 6;
  • 54) 0.000 076 838 615 402 344 454 553 6 × 2 = 0 + 0.000 153 677 230 804 688 909 107 2;
  • 55) 0.000 153 677 230 804 688 909 107 2 × 2 = 0 + 0.000 307 354 461 609 377 818 214 4;
  • 56) 0.000 307 354 461 609 377 818 214 4 × 2 = 0 + 0.000 614 708 923 218 755 636 428 8;
  • 57) 0.000 614 708 923 218 755 636 428 8 × 2 = 0 + 0.001 229 417 846 437 511 272 857 6;
  • 58) 0.001 229 417 846 437 511 272 857 6 × 2 = 0 + 0.002 458 835 692 875 022 545 715 2;
  • 59) 0.002 458 835 692 875 022 545 715 2 × 2 = 0 + 0.004 917 671 385 750 045 091 430 4;
  • 60) 0.004 917 671 385 750 045 091 430 4 × 2 = 0 + 0.009 835 342 771 500 090 182 860 8;
  • 61) 0.009 835 342 771 500 090 182 860 8 × 2 = 0 + 0.019 670 685 543 000 180 365 721 6;
  • 62) 0.019 670 685 543 000 180 365 721 6 × 2 = 0 + 0.039 341 371 086 000 360 731 443 2;
  • 63) 0.039 341 371 086 000 360 731 443 2 × 2 = 0 + 0.078 682 742 172 000 721 462 886 4;
  • 64) 0.078 682 742 172 000 721 462 886 4 × 2 = 0 + 0.157 365 484 344 001 442 925 772 8;
  • 65) 0.157 365 484 344 001 442 925 772 8 × 2 = 0 + 0.314 730 968 688 002 885 851 545 6;
  • 66) 0.314 730 968 688 002 885 851 545 6 × 2 = 0 + 0.629 461 937 376 005 771 703 091 2;
  • 67) 0.629 461 937 376 005 771 703 091 2 × 2 = 1 + 0.258 923 874 752 011 543 406 182 4;
  • 68) 0.258 923 874 752 011 543 406 182 4 × 2 = 0 + 0.517 847 749 504 023 086 812 364 8;
  • 69) 0.517 847 749 504 023 086 812 364 8 × 2 = 1 + 0.035 695 499 008 046 173 624 729 6;
  • 70) 0.035 695 499 008 046 173 624 729 6 × 2 = 0 + 0.071 390 998 016 092 347 249 459 2;
  • 71) 0.071 390 998 016 092 347 249 459 2 × 2 = 0 + 0.142 781 996 032 184 694 498 918 4;
  • 72) 0.142 781 996 032 184 694 498 918 4 × 2 = 0 + 0.285 563 992 064 369 388 997 836 8;
  • 73) 0.285 563 992 064 369 388 997 836 8 × 2 = 0 + 0.571 127 984 128 738 777 995 673 6;
  • 74) 0.571 127 984 128 738 777 995 673 6 × 2 = 1 + 0.142 255 968 257 477 555 991 347 2;
  • 75) 0.142 255 968 257 477 555 991 347 2 × 2 = 0 + 0.284 511 936 514 955 111 982 694 4;
  • 76) 0.284 511 936 514 955 111 982 694 4 × 2 = 0 + 0.569 023 873 029 910 223 965 388 8;
  • 77) 0.569 023 873 029 910 223 965 388 8 × 2 = 1 + 0.138 047 746 059 820 447 930 777 6;
  • 78) 0.138 047 746 059 820 447 930 777 6 × 2 = 0 + 0.276 095 492 119 640 895 861 555 2;
  • 79) 0.276 095 492 119 640 895 861 555 2 × 2 = 0 + 0.552 190 984 239 281 791 723 110 4;
  • 80) 0.552 190 984 239 281 791 723 110 4 × 2 = 1 + 0.104 381 968 478 563 583 446 220 8;
  • 81) 0.104 381 968 478 563 583 446 220 8 × 2 = 0 + 0.208 763 936 957 127 166 892 441 6;
  • 82) 0.208 763 936 957 127 166 892 441 6 × 2 = 0 + 0.417 527 873 914 254 333 784 883 2;
  • 83) 0.417 527 873 914 254 333 784 883 2 × 2 = 0 + 0.835 055 747 828 508 667 569 766 4;
  • 84) 0.835 055 747 828 508 667 569 766 4 × 2 = 1 + 0.670 111 495 657 017 335 139 532 8;
  • 85) 0.670 111 495 657 017 335 139 532 8 × 2 = 1 + 0.340 222 991 314 034 670 279 065 6;
  • 86) 0.340 222 991 314 034 670 279 065 6 × 2 = 0 + 0.680 445 982 628 069 340 558 131 2;
  • 87) 0.680 445 982 628 069 340 558 131 2 × 2 = 1 + 0.360 891 965 256 138 681 116 262 4;
  • 88) 0.360 891 965 256 138 681 116 262 4 × 2 = 0 + 0.721 783 930 512 277 362 232 524 8;
  • 89) 0.721 783 930 512 277 362 232 524 8 × 2 = 1 + 0.443 567 861 024 554 724 465 049 6;
  • 90) 0.443 567 861 024 554 724 465 049 6 × 2 = 0 + 0.887 135 722 049 109 448 930 099 2;
  • 91) 0.887 135 722 049 109 448 930 099 2 × 2 = 1 + 0.774 271 444 098 218 897 860 198 4;
  • 92) 0.774 271 444 098 218 897 860 198 4 × 2 = 1 + 0.548 542 888 196 437 795 720 396 8;
  • 93) 0.548 542 888 196 437 795 720 396 8 × 2 = 1 + 0.097 085 776 392 875 591 440 793 6;
  • 94) 0.097 085 776 392 875 591 440 793 6 × 2 = 0 + 0.194 171 552 785 751 182 881 587 2;
  • 95) 0.194 171 552 785 751 182 881 587 2 × 2 = 0 + 0.388 343 105 571 502 365 763 174 4;
  • 96) 0.388 343 105 571 502 365 763 174 4 × 2 = 0 + 0.776 686 211 143 004 731 526 348 8;
  • 97) 0.776 686 211 143 004 731 526 348 8 × 2 = 1 + 0.553 372 422 286 009 463 052 697 6;
  • 98) 0.553 372 422 286 009 463 052 697 6 × 2 = 1 + 0.106 744 844 572 018 926 105 395 2;
  • 99) 0.106 744 844 572 018 926 105 395 2 × 2 = 0 + 0.213 489 689 144 037 852 210 790 4;
  • 100) 0.213 489 689 144 037 852 210 790 4 × 2 = 0 + 0.426 979 378 288 075 704 421 580 8;
  • 101) 0.426 979 378 288 075 704 421 580 8 × 2 = 0 + 0.853 958 756 576 151 408 843 161 6;
  • 102) 0.853 958 756 576 151 408 843 161 6 × 2 = 1 + 0.707 917 513 152 302 817 686 323 2;
  • 103) 0.707 917 513 152 302 817 686 323 2 × 2 = 1 + 0.415 835 026 304 605 635 372 646 4;
  • 104) 0.415 835 026 304 605 635 372 646 4 × 2 = 0 + 0.831 670 052 609 211 270 745 292 8;
  • 105) 0.831 670 052 609 211 270 745 292 8 × 2 = 1 + 0.663 340 105 218 422 541 490 585 6;
  • 106) 0.663 340 105 218 422 541 490 585 6 × 2 = 1 + 0.326 680 210 436 845 082 981 171 2;
  • 107) 0.326 680 210 436 845 082 981 171 2 × 2 = 0 + 0.653 360 420 873 690 165 962 342 4;
  • 108) 0.653 360 420 873 690 165 962 342 4 × 2 = 1 + 0.306 720 841 747 380 331 924 684 8;
  • 109) 0.306 720 841 747 380 331 924 684 8 × 2 = 0 + 0.613 441 683 494 760 663 849 369 6;
  • 110) 0.613 441 683 494 760 663 849 369 6 × 2 = 1 + 0.226 883 366 989 521 327 698 739 2;
  • 111) 0.226 883 366 989 521 327 698 739 2 × 2 = 0 + 0.453 766 733 979 042 655 397 478 4;
  • 112) 0.453 766 733 979 042 655 397 478 4 × 2 = 0 + 0.907 533 467 958 085 310 794 956 8;
  • 113) 0.907 533 467 958 085 310 794 956 8 × 2 = 1 + 0.815 066 935 916 170 621 589 913 6;
  • 114) 0.815 066 935 916 170 621 589 913 6 × 2 = 1 + 0.630 133 871 832 341 243 179 827 2;
  • 115) 0.630 133 871 832 341 243 179 827 2 × 2 = 1 + 0.260 267 743 664 682 486 359 654 4;
  • 116) 0.260 267 743 664 682 486 359 654 4 × 2 = 0 + 0.520 535 487 329 364 972 719 308 8;
  • 117) 0.520 535 487 329 364 972 719 308 8 × 2 = 1 + 0.041 070 974 658 729 945 438 617 6;
  • 118) 0.041 070 974 658 729 945 438 617 6 × 2 = 0 + 0.082 141 949 317 459 890 877 235 2;
  • 119) 0.082 141 949 317 459 890 877 235 2 × 2 = 0 + 0.164 283 898 634 919 781 754 470 4;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 530 8(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1001 0001 1010 1011 1000 1100 0110 1101 0100 1110 100(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 530 8(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1001 0001 1010 1011 1000 1100 0110 1101 0100 1110 100(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 530 8(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1001 0001 1010 1011 1000 1100 0110 1101 0100 1110 100(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1001 0001 1010 1011 1000 1100 0110 1101 0100 1110 100(2) × 20 =


1.0100 0010 0100 1000 1101 0101 1100 0110 0011 0110 1010 0111 0100(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 0100 1000 1101 0101 1100 0110 0011 0110 1010 0111 0100


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 0100 1000 1101 0101 1100 0110 0011 0110 1010 0111 0100 =


0100 0010 0100 1000 1101 0101 1100 0110 0011 0110 1010 0111 0100


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 0100 1000 1101 0101 1100 0110 0011 0110 1010 0111 0100


Decimal number 0.000 000 000 000 000 000 008 530 8 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 0100 1000 1101 0101 1100 0110 0011 0110 1010 0111 0100


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100