0.000 000 000 000 000 000 008 536 98 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 536 98(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 536 98(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 536 98.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 536 98 × 2 = 0 + 0.000 000 000 000 000 000 017 073 96;
  • 2) 0.000 000 000 000 000 000 017 073 96 × 2 = 0 + 0.000 000 000 000 000 000 034 147 92;
  • 3) 0.000 000 000 000 000 000 034 147 92 × 2 = 0 + 0.000 000 000 000 000 000 068 295 84;
  • 4) 0.000 000 000 000 000 000 068 295 84 × 2 = 0 + 0.000 000 000 000 000 000 136 591 68;
  • 5) 0.000 000 000 000 000 000 136 591 68 × 2 = 0 + 0.000 000 000 000 000 000 273 183 36;
  • 6) 0.000 000 000 000 000 000 273 183 36 × 2 = 0 + 0.000 000 000 000 000 000 546 366 72;
  • 7) 0.000 000 000 000 000 000 546 366 72 × 2 = 0 + 0.000 000 000 000 000 001 092 733 44;
  • 8) 0.000 000 000 000 000 001 092 733 44 × 2 = 0 + 0.000 000 000 000 000 002 185 466 88;
  • 9) 0.000 000 000 000 000 002 185 466 88 × 2 = 0 + 0.000 000 000 000 000 004 370 933 76;
  • 10) 0.000 000 000 000 000 004 370 933 76 × 2 = 0 + 0.000 000 000 000 000 008 741 867 52;
  • 11) 0.000 000 000 000 000 008 741 867 52 × 2 = 0 + 0.000 000 000 000 000 017 483 735 04;
  • 12) 0.000 000 000 000 000 017 483 735 04 × 2 = 0 + 0.000 000 000 000 000 034 967 470 08;
  • 13) 0.000 000 000 000 000 034 967 470 08 × 2 = 0 + 0.000 000 000 000 000 069 934 940 16;
  • 14) 0.000 000 000 000 000 069 934 940 16 × 2 = 0 + 0.000 000 000 000 000 139 869 880 32;
  • 15) 0.000 000 000 000 000 139 869 880 32 × 2 = 0 + 0.000 000 000 000 000 279 739 760 64;
  • 16) 0.000 000 000 000 000 279 739 760 64 × 2 = 0 + 0.000 000 000 000 000 559 479 521 28;
  • 17) 0.000 000 000 000 000 559 479 521 28 × 2 = 0 + 0.000 000 000 000 001 118 959 042 56;
  • 18) 0.000 000 000 000 001 118 959 042 56 × 2 = 0 + 0.000 000 000 000 002 237 918 085 12;
  • 19) 0.000 000 000 000 002 237 918 085 12 × 2 = 0 + 0.000 000 000 000 004 475 836 170 24;
  • 20) 0.000 000 000 000 004 475 836 170 24 × 2 = 0 + 0.000 000 000 000 008 951 672 340 48;
  • 21) 0.000 000 000 000 008 951 672 340 48 × 2 = 0 + 0.000 000 000 000 017 903 344 680 96;
  • 22) 0.000 000 000 000 017 903 344 680 96 × 2 = 0 + 0.000 000 000 000 035 806 689 361 92;
  • 23) 0.000 000 000 000 035 806 689 361 92 × 2 = 0 + 0.000 000 000 000 071 613 378 723 84;
  • 24) 0.000 000 000 000 071 613 378 723 84 × 2 = 0 + 0.000 000 000 000 143 226 757 447 68;
  • 25) 0.000 000 000 000 143 226 757 447 68 × 2 = 0 + 0.000 000 000 000 286 453 514 895 36;
  • 26) 0.000 000 000 000 286 453 514 895 36 × 2 = 0 + 0.000 000 000 000 572 907 029 790 72;
  • 27) 0.000 000 000 000 572 907 029 790 72 × 2 = 0 + 0.000 000 000 001 145 814 059 581 44;
  • 28) 0.000 000 000 001 145 814 059 581 44 × 2 = 0 + 0.000 000 000 002 291 628 119 162 88;
  • 29) 0.000 000 000 002 291 628 119 162 88 × 2 = 0 + 0.000 000 000 004 583 256 238 325 76;
  • 30) 0.000 000 000 004 583 256 238 325 76 × 2 = 0 + 0.000 000 000 009 166 512 476 651 52;
  • 31) 0.000 000 000 009 166 512 476 651 52 × 2 = 0 + 0.000 000 000 018 333 024 953 303 04;
  • 32) 0.000 000 000 018 333 024 953 303 04 × 2 = 0 + 0.000 000 000 036 666 049 906 606 08;
  • 33) 0.000 000 000 036 666 049 906 606 08 × 2 = 0 + 0.000 000 000 073 332 099 813 212 16;
  • 34) 0.000 000 000 073 332 099 813 212 16 × 2 = 0 + 0.000 000 000 146 664 199 626 424 32;
  • 35) 0.000 000 000 146 664 199 626 424 32 × 2 = 0 + 0.000 000 000 293 328 399 252 848 64;
  • 36) 0.000 000 000 293 328 399 252 848 64 × 2 = 0 + 0.000 000 000 586 656 798 505 697 28;
  • 37) 0.000 000 000 586 656 798 505 697 28 × 2 = 0 + 0.000 000 001 173 313 597 011 394 56;
  • 38) 0.000 000 001 173 313 597 011 394 56 × 2 = 0 + 0.000 000 002 346 627 194 022 789 12;
  • 39) 0.000 000 002 346 627 194 022 789 12 × 2 = 0 + 0.000 000 004 693 254 388 045 578 24;
  • 40) 0.000 000 004 693 254 388 045 578 24 × 2 = 0 + 0.000 000 009 386 508 776 091 156 48;
  • 41) 0.000 000 009 386 508 776 091 156 48 × 2 = 0 + 0.000 000 018 773 017 552 182 312 96;
  • 42) 0.000 000 018 773 017 552 182 312 96 × 2 = 0 + 0.000 000 037 546 035 104 364 625 92;
  • 43) 0.000 000 037 546 035 104 364 625 92 × 2 = 0 + 0.000 000 075 092 070 208 729 251 84;
  • 44) 0.000 000 075 092 070 208 729 251 84 × 2 = 0 + 0.000 000 150 184 140 417 458 503 68;
  • 45) 0.000 000 150 184 140 417 458 503 68 × 2 = 0 + 0.000 000 300 368 280 834 917 007 36;
  • 46) 0.000 000 300 368 280 834 917 007 36 × 2 = 0 + 0.000 000 600 736 561 669 834 014 72;
  • 47) 0.000 000 600 736 561 669 834 014 72 × 2 = 0 + 0.000 001 201 473 123 339 668 029 44;
  • 48) 0.000 001 201 473 123 339 668 029 44 × 2 = 0 + 0.000 002 402 946 246 679 336 058 88;
  • 49) 0.000 002 402 946 246 679 336 058 88 × 2 = 0 + 0.000 004 805 892 493 358 672 117 76;
  • 50) 0.000 004 805 892 493 358 672 117 76 × 2 = 0 + 0.000 009 611 784 986 717 344 235 52;
  • 51) 0.000 009 611 784 986 717 344 235 52 × 2 = 0 + 0.000 019 223 569 973 434 688 471 04;
  • 52) 0.000 019 223 569 973 434 688 471 04 × 2 = 0 + 0.000 038 447 139 946 869 376 942 08;
  • 53) 0.000 038 447 139 946 869 376 942 08 × 2 = 0 + 0.000 076 894 279 893 738 753 884 16;
  • 54) 0.000 076 894 279 893 738 753 884 16 × 2 = 0 + 0.000 153 788 559 787 477 507 768 32;
  • 55) 0.000 153 788 559 787 477 507 768 32 × 2 = 0 + 0.000 307 577 119 574 955 015 536 64;
  • 56) 0.000 307 577 119 574 955 015 536 64 × 2 = 0 + 0.000 615 154 239 149 910 031 073 28;
  • 57) 0.000 615 154 239 149 910 031 073 28 × 2 = 0 + 0.001 230 308 478 299 820 062 146 56;
  • 58) 0.001 230 308 478 299 820 062 146 56 × 2 = 0 + 0.002 460 616 956 599 640 124 293 12;
  • 59) 0.002 460 616 956 599 640 124 293 12 × 2 = 0 + 0.004 921 233 913 199 280 248 586 24;
  • 60) 0.004 921 233 913 199 280 248 586 24 × 2 = 0 + 0.009 842 467 826 398 560 497 172 48;
  • 61) 0.009 842 467 826 398 560 497 172 48 × 2 = 0 + 0.019 684 935 652 797 120 994 344 96;
  • 62) 0.019 684 935 652 797 120 994 344 96 × 2 = 0 + 0.039 369 871 305 594 241 988 689 92;
  • 63) 0.039 369 871 305 594 241 988 689 92 × 2 = 0 + 0.078 739 742 611 188 483 977 379 84;
  • 64) 0.078 739 742 611 188 483 977 379 84 × 2 = 0 + 0.157 479 485 222 376 967 954 759 68;
  • 65) 0.157 479 485 222 376 967 954 759 68 × 2 = 0 + 0.314 958 970 444 753 935 909 519 36;
  • 66) 0.314 958 970 444 753 935 909 519 36 × 2 = 0 + 0.629 917 940 889 507 871 819 038 72;
  • 67) 0.629 917 940 889 507 871 819 038 72 × 2 = 1 + 0.259 835 881 779 015 743 638 077 44;
  • 68) 0.259 835 881 779 015 743 638 077 44 × 2 = 0 + 0.519 671 763 558 031 487 276 154 88;
  • 69) 0.519 671 763 558 031 487 276 154 88 × 2 = 1 + 0.039 343 527 116 062 974 552 309 76;
  • 70) 0.039 343 527 116 062 974 552 309 76 × 2 = 0 + 0.078 687 054 232 125 949 104 619 52;
  • 71) 0.078 687 054 232 125 949 104 619 52 × 2 = 0 + 0.157 374 108 464 251 898 209 239 04;
  • 72) 0.157 374 108 464 251 898 209 239 04 × 2 = 0 + 0.314 748 216 928 503 796 418 478 08;
  • 73) 0.314 748 216 928 503 796 418 478 08 × 2 = 0 + 0.629 496 433 857 007 592 836 956 16;
  • 74) 0.629 496 433 857 007 592 836 956 16 × 2 = 1 + 0.258 992 867 714 015 185 673 912 32;
  • 75) 0.258 992 867 714 015 185 673 912 32 × 2 = 0 + 0.517 985 735 428 030 371 347 824 64;
  • 76) 0.517 985 735 428 030 371 347 824 64 × 2 = 1 + 0.035 971 470 856 060 742 695 649 28;
  • 77) 0.035 971 470 856 060 742 695 649 28 × 2 = 0 + 0.071 942 941 712 121 485 391 298 56;
  • 78) 0.071 942 941 712 121 485 391 298 56 × 2 = 0 + 0.143 885 883 424 242 970 782 597 12;
  • 79) 0.143 885 883 424 242 970 782 597 12 × 2 = 0 + 0.287 771 766 848 485 941 565 194 24;
  • 80) 0.287 771 766 848 485 941 565 194 24 × 2 = 0 + 0.575 543 533 696 971 883 130 388 48;
  • 81) 0.575 543 533 696 971 883 130 388 48 × 2 = 1 + 0.151 087 067 393 943 766 260 776 96;
  • 82) 0.151 087 067 393 943 766 260 776 96 × 2 = 0 + 0.302 174 134 787 887 532 521 553 92;
  • 83) 0.302 174 134 787 887 532 521 553 92 × 2 = 0 + 0.604 348 269 575 775 065 043 107 84;
  • 84) 0.604 348 269 575 775 065 043 107 84 × 2 = 1 + 0.208 696 539 151 550 130 086 215 68;
  • 85) 0.208 696 539 151 550 130 086 215 68 × 2 = 0 + 0.417 393 078 303 100 260 172 431 36;
  • 86) 0.417 393 078 303 100 260 172 431 36 × 2 = 0 + 0.834 786 156 606 200 520 344 862 72;
  • 87) 0.834 786 156 606 200 520 344 862 72 × 2 = 1 + 0.669 572 313 212 401 040 689 725 44;
  • 88) 0.669 572 313 212 401 040 689 725 44 × 2 = 1 + 0.339 144 626 424 802 081 379 450 88;
  • 89) 0.339 144 626 424 802 081 379 450 88 × 2 = 0 + 0.678 289 252 849 604 162 758 901 76;
  • 90) 0.678 289 252 849 604 162 758 901 76 × 2 = 1 + 0.356 578 505 699 208 325 517 803 52;
  • 91) 0.356 578 505 699 208 325 517 803 52 × 2 = 0 + 0.713 157 011 398 416 651 035 607 04;
  • 92) 0.713 157 011 398 416 651 035 607 04 × 2 = 1 + 0.426 314 022 796 833 302 071 214 08;
  • 93) 0.426 314 022 796 833 302 071 214 08 × 2 = 0 + 0.852 628 045 593 666 604 142 428 16;
  • 94) 0.852 628 045 593 666 604 142 428 16 × 2 = 1 + 0.705 256 091 187 333 208 284 856 32;
  • 95) 0.705 256 091 187 333 208 284 856 32 × 2 = 1 + 0.410 512 182 374 666 416 569 712 64;
  • 96) 0.410 512 182 374 666 416 569 712 64 × 2 = 0 + 0.821 024 364 749 332 833 139 425 28;
  • 97) 0.821 024 364 749 332 833 139 425 28 × 2 = 1 + 0.642 048 729 498 665 666 278 850 56;
  • 98) 0.642 048 729 498 665 666 278 850 56 × 2 = 1 + 0.284 097 458 997 331 332 557 701 12;
  • 99) 0.284 097 458 997 331 332 557 701 12 × 2 = 0 + 0.568 194 917 994 662 665 115 402 24;
  • 100) 0.568 194 917 994 662 665 115 402 24 × 2 = 1 + 0.136 389 835 989 325 330 230 804 48;
  • 101) 0.136 389 835 989 325 330 230 804 48 × 2 = 0 + 0.272 779 671 978 650 660 461 608 96;
  • 102) 0.272 779 671 978 650 660 461 608 96 × 2 = 0 + 0.545 559 343 957 301 320 923 217 92;
  • 103) 0.545 559 343 957 301 320 923 217 92 × 2 = 1 + 0.091 118 687 914 602 641 846 435 84;
  • 104) 0.091 118 687 914 602 641 846 435 84 × 2 = 0 + 0.182 237 375 829 205 283 692 871 68;
  • 105) 0.182 237 375 829 205 283 692 871 68 × 2 = 0 + 0.364 474 751 658 410 567 385 743 36;
  • 106) 0.364 474 751 658 410 567 385 743 36 × 2 = 0 + 0.728 949 503 316 821 134 771 486 72;
  • 107) 0.728 949 503 316 821 134 771 486 72 × 2 = 1 + 0.457 899 006 633 642 269 542 973 44;
  • 108) 0.457 899 006 633 642 269 542 973 44 × 2 = 0 + 0.915 798 013 267 284 539 085 946 88;
  • 109) 0.915 798 013 267 284 539 085 946 88 × 2 = 1 + 0.831 596 026 534 569 078 171 893 76;
  • 110) 0.831 596 026 534 569 078 171 893 76 × 2 = 1 + 0.663 192 053 069 138 156 343 787 52;
  • 111) 0.663 192 053 069 138 156 343 787 52 × 2 = 1 + 0.326 384 106 138 276 312 687 575 04;
  • 112) 0.326 384 106 138 276 312 687 575 04 × 2 = 0 + 0.652 768 212 276 552 625 375 150 08;
  • 113) 0.652 768 212 276 552 625 375 150 08 × 2 = 1 + 0.305 536 424 553 105 250 750 300 16;
  • 114) 0.305 536 424 553 105 250 750 300 16 × 2 = 0 + 0.611 072 849 106 210 501 500 600 32;
  • 115) 0.611 072 849 106 210 501 500 600 32 × 2 = 1 + 0.222 145 698 212 421 003 001 200 64;
  • 116) 0.222 145 698 212 421 003 001 200 64 × 2 = 0 + 0.444 291 396 424 842 006 002 401 28;
  • 117) 0.444 291 396 424 842 006 002 401 28 × 2 = 0 + 0.888 582 792 849 684 012 004 802 56;
  • 118) 0.888 582 792 849 684 012 004 802 56 × 2 = 1 + 0.777 165 585 699 368 024 009 605 12;
  • 119) 0.777 165 585 699 368 024 009 605 12 × 2 = 1 + 0.554 331 171 398 736 048 019 210 24;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 536 98(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1001 0011 0101 0110 1101 0010 0010 1110 1010 011(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 536 98(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1001 0011 0101 0110 1101 0010 0010 1110 1010 011(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 536 98(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1001 0011 0101 0110 1101 0010 0010 1110 1010 011(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1001 0011 0101 0110 1101 0010 0010 1110 1010 011(2) × 20 =


1.0100 0010 1000 0100 1001 1010 1011 0110 1001 0001 0111 0101 0011(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 1000 0100 1001 1010 1011 0110 1001 0001 0111 0101 0011


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 1000 0100 1001 1010 1011 0110 1001 0001 0111 0101 0011 =


0100 0010 1000 0100 1001 1010 1011 0110 1001 0001 0111 0101 0011


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 1000 0100 1001 1010 1011 0110 1001 0001 0111 0101 0011


Decimal number 0.000 000 000 000 000 000 008 536 98 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 1000 0100 1001 1010 1011 0110 1001 0001 0111 0101 0011


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100