0.000 000 000 000 000 000 008 536 55 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 536 55(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 536 55(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 536 55.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 536 55 × 2 = 0 + 0.000 000 000 000 000 000 017 073 1;
  • 2) 0.000 000 000 000 000 000 017 073 1 × 2 = 0 + 0.000 000 000 000 000 000 034 146 2;
  • 3) 0.000 000 000 000 000 000 034 146 2 × 2 = 0 + 0.000 000 000 000 000 000 068 292 4;
  • 4) 0.000 000 000 000 000 000 068 292 4 × 2 = 0 + 0.000 000 000 000 000 000 136 584 8;
  • 5) 0.000 000 000 000 000 000 136 584 8 × 2 = 0 + 0.000 000 000 000 000 000 273 169 6;
  • 6) 0.000 000 000 000 000 000 273 169 6 × 2 = 0 + 0.000 000 000 000 000 000 546 339 2;
  • 7) 0.000 000 000 000 000 000 546 339 2 × 2 = 0 + 0.000 000 000 000 000 001 092 678 4;
  • 8) 0.000 000 000 000 000 001 092 678 4 × 2 = 0 + 0.000 000 000 000 000 002 185 356 8;
  • 9) 0.000 000 000 000 000 002 185 356 8 × 2 = 0 + 0.000 000 000 000 000 004 370 713 6;
  • 10) 0.000 000 000 000 000 004 370 713 6 × 2 = 0 + 0.000 000 000 000 000 008 741 427 2;
  • 11) 0.000 000 000 000 000 008 741 427 2 × 2 = 0 + 0.000 000 000 000 000 017 482 854 4;
  • 12) 0.000 000 000 000 000 017 482 854 4 × 2 = 0 + 0.000 000 000 000 000 034 965 708 8;
  • 13) 0.000 000 000 000 000 034 965 708 8 × 2 = 0 + 0.000 000 000 000 000 069 931 417 6;
  • 14) 0.000 000 000 000 000 069 931 417 6 × 2 = 0 + 0.000 000 000 000 000 139 862 835 2;
  • 15) 0.000 000 000 000 000 139 862 835 2 × 2 = 0 + 0.000 000 000 000 000 279 725 670 4;
  • 16) 0.000 000 000 000 000 279 725 670 4 × 2 = 0 + 0.000 000 000 000 000 559 451 340 8;
  • 17) 0.000 000 000 000 000 559 451 340 8 × 2 = 0 + 0.000 000 000 000 001 118 902 681 6;
  • 18) 0.000 000 000 000 001 118 902 681 6 × 2 = 0 + 0.000 000 000 000 002 237 805 363 2;
  • 19) 0.000 000 000 000 002 237 805 363 2 × 2 = 0 + 0.000 000 000 000 004 475 610 726 4;
  • 20) 0.000 000 000 000 004 475 610 726 4 × 2 = 0 + 0.000 000 000 000 008 951 221 452 8;
  • 21) 0.000 000 000 000 008 951 221 452 8 × 2 = 0 + 0.000 000 000 000 017 902 442 905 6;
  • 22) 0.000 000 000 000 017 902 442 905 6 × 2 = 0 + 0.000 000 000 000 035 804 885 811 2;
  • 23) 0.000 000 000 000 035 804 885 811 2 × 2 = 0 + 0.000 000 000 000 071 609 771 622 4;
  • 24) 0.000 000 000 000 071 609 771 622 4 × 2 = 0 + 0.000 000 000 000 143 219 543 244 8;
  • 25) 0.000 000 000 000 143 219 543 244 8 × 2 = 0 + 0.000 000 000 000 286 439 086 489 6;
  • 26) 0.000 000 000 000 286 439 086 489 6 × 2 = 0 + 0.000 000 000 000 572 878 172 979 2;
  • 27) 0.000 000 000 000 572 878 172 979 2 × 2 = 0 + 0.000 000 000 001 145 756 345 958 4;
  • 28) 0.000 000 000 001 145 756 345 958 4 × 2 = 0 + 0.000 000 000 002 291 512 691 916 8;
  • 29) 0.000 000 000 002 291 512 691 916 8 × 2 = 0 + 0.000 000 000 004 583 025 383 833 6;
  • 30) 0.000 000 000 004 583 025 383 833 6 × 2 = 0 + 0.000 000 000 009 166 050 767 667 2;
  • 31) 0.000 000 000 009 166 050 767 667 2 × 2 = 0 + 0.000 000 000 018 332 101 535 334 4;
  • 32) 0.000 000 000 018 332 101 535 334 4 × 2 = 0 + 0.000 000 000 036 664 203 070 668 8;
  • 33) 0.000 000 000 036 664 203 070 668 8 × 2 = 0 + 0.000 000 000 073 328 406 141 337 6;
  • 34) 0.000 000 000 073 328 406 141 337 6 × 2 = 0 + 0.000 000 000 146 656 812 282 675 2;
  • 35) 0.000 000 000 146 656 812 282 675 2 × 2 = 0 + 0.000 000 000 293 313 624 565 350 4;
  • 36) 0.000 000 000 293 313 624 565 350 4 × 2 = 0 + 0.000 000 000 586 627 249 130 700 8;
  • 37) 0.000 000 000 586 627 249 130 700 8 × 2 = 0 + 0.000 000 001 173 254 498 261 401 6;
  • 38) 0.000 000 001 173 254 498 261 401 6 × 2 = 0 + 0.000 000 002 346 508 996 522 803 2;
  • 39) 0.000 000 002 346 508 996 522 803 2 × 2 = 0 + 0.000 000 004 693 017 993 045 606 4;
  • 40) 0.000 000 004 693 017 993 045 606 4 × 2 = 0 + 0.000 000 009 386 035 986 091 212 8;
  • 41) 0.000 000 009 386 035 986 091 212 8 × 2 = 0 + 0.000 000 018 772 071 972 182 425 6;
  • 42) 0.000 000 018 772 071 972 182 425 6 × 2 = 0 + 0.000 000 037 544 143 944 364 851 2;
  • 43) 0.000 000 037 544 143 944 364 851 2 × 2 = 0 + 0.000 000 075 088 287 888 729 702 4;
  • 44) 0.000 000 075 088 287 888 729 702 4 × 2 = 0 + 0.000 000 150 176 575 777 459 404 8;
  • 45) 0.000 000 150 176 575 777 459 404 8 × 2 = 0 + 0.000 000 300 353 151 554 918 809 6;
  • 46) 0.000 000 300 353 151 554 918 809 6 × 2 = 0 + 0.000 000 600 706 303 109 837 619 2;
  • 47) 0.000 000 600 706 303 109 837 619 2 × 2 = 0 + 0.000 001 201 412 606 219 675 238 4;
  • 48) 0.000 001 201 412 606 219 675 238 4 × 2 = 0 + 0.000 002 402 825 212 439 350 476 8;
  • 49) 0.000 002 402 825 212 439 350 476 8 × 2 = 0 + 0.000 004 805 650 424 878 700 953 6;
  • 50) 0.000 004 805 650 424 878 700 953 6 × 2 = 0 + 0.000 009 611 300 849 757 401 907 2;
  • 51) 0.000 009 611 300 849 757 401 907 2 × 2 = 0 + 0.000 019 222 601 699 514 803 814 4;
  • 52) 0.000 019 222 601 699 514 803 814 4 × 2 = 0 + 0.000 038 445 203 399 029 607 628 8;
  • 53) 0.000 038 445 203 399 029 607 628 8 × 2 = 0 + 0.000 076 890 406 798 059 215 257 6;
  • 54) 0.000 076 890 406 798 059 215 257 6 × 2 = 0 + 0.000 153 780 813 596 118 430 515 2;
  • 55) 0.000 153 780 813 596 118 430 515 2 × 2 = 0 + 0.000 307 561 627 192 236 861 030 4;
  • 56) 0.000 307 561 627 192 236 861 030 4 × 2 = 0 + 0.000 615 123 254 384 473 722 060 8;
  • 57) 0.000 615 123 254 384 473 722 060 8 × 2 = 0 + 0.001 230 246 508 768 947 444 121 6;
  • 58) 0.001 230 246 508 768 947 444 121 6 × 2 = 0 + 0.002 460 493 017 537 894 888 243 2;
  • 59) 0.002 460 493 017 537 894 888 243 2 × 2 = 0 + 0.004 920 986 035 075 789 776 486 4;
  • 60) 0.004 920 986 035 075 789 776 486 4 × 2 = 0 + 0.009 841 972 070 151 579 552 972 8;
  • 61) 0.009 841 972 070 151 579 552 972 8 × 2 = 0 + 0.019 683 944 140 303 159 105 945 6;
  • 62) 0.019 683 944 140 303 159 105 945 6 × 2 = 0 + 0.039 367 888 280 606 318 211 891 2;
  • 63) 0.039 367 888 280 606 318 211 891 2 × 2 = 0 + 0.078 735 776 561 212 636 423 782 4;
  • 64) 0.078 735 776 561 212 636 423 782 4 × 2 = 0 + 0.157 471 553 122 425 272 847 564 8;
  • 65) 0.157 471 553 122 425 272 847 564 8 × 2 = 0 + 0.314 943 106 244 850 545 695 129 6;
  • 66) 0.314 943 106 244 850 545 695 129 6 × 2 = 0 + 0.629 886 212 489 701 091 390 259 2;
  • 67) 0.629 886 212 489 701 091 390 259 2 × 2 = 1 + 0.259 772 424 979 402 182 780 518 4;
  • 68) 0.259 772 424 979 402 182 780 518 4 × 2 = 0 + 0.519 544 849 958 804 365 561 036 8;
  • 69) 0.519 544 849 958 804 365 561 036 8 × 2 = 1 + 0.039 089 699 917 608 731 122 073 6;
  • 70) 0.039 089 699 917 608 731 122 073 6 × 2 = 0 + 0.078 179 399 835 217 462 244 147 2;
  • 71) 0.078 179 399 835 217 462 244 147 2 × 2 = 0 + 0.156 358 799 670 434 924 488 294 4;
  • 72) 0.156 358 799 670 434 924 488 294 4 × 2 = 0 + 0.312 717 599 340 869 848 976 588 8;
  • 73) 0.312 717 599 340 869 848 976 588 8 × 2 = 0 + 0.625 435 198 681 739 697 953 177 6;
  • 74) 0.625 435 198 681 739 697 953 177 6 × 2 = 1 + 0.250 870 397 363 479 395 906 355 2;
  • 75) 0.250 870 397 363 479 395 906 355 2 × 2 = 0 + 0.501 740 794 726 958 791 812 710 4;
  • 76) 0.501 740 794 726 958 791 812 710 4 × 2 = 1 + 0.003 481 589 453 917 583 625 420 8;
  • 77) 0.003 481 589 453 917 583 625 420 8 × 2 = 0 + 0.006 963 178 907 835 167 250 841 6;
  • 78) 0.006 963 178 907 835 167 250 841 6 × 2 = 0 + 0.013 926 357 815 670 334 501 683 2;
  • 79) 0.013 926 357 815 670 334 501 683 2 × 2 = 0 + 0.027 852 715 631 340 669 003 366 4;
  • 80) 0.027 852 715 631 340 669 003 366 4 × 2 = 0 + 0.055 705 431 262 681 338 006 732 8;
  • 81) 0.055 705 431 262 681 338 006 732 8 × 2 = 0 + 0.111 410 862 525 362 676 013 465 6;
  • 82) 0.111 410 862 525 362 676 013 465 6 × 2 = 0 + 0.222 821 725 050 725 352 026 931 2;
  • 83) 0.222 821 725 050 725 352 026 931 2 × 2 = 0 + 0.445 643 450 101 450 704 053 862 4;
  • 84) 0.445 643 450 101 450 704 053 862 4 × 2 = 0 + 0.891 286 900 202 901 408 107 724 8;
  • 85) 0.891 286 900 202 901 408 107 724 8 × 2 = 1 + 0.782 573 800 405 802 816 215 449 6;
  • 86) 0.782 573 800 405 802 816 215 449 6 × 2 = 1 + 0.565 147 600 811 605 632 430 899 2;
  • 87) 0.565 147 600 811 605 632 430 899 2 × 2 = 1 + 0.130 295 201 623 211 264 861 798 4;
  • 88) 0.130 295 201 623 211 264 861 798 4 × 2 = 0 + 0.260 590 403 246 422 529 723 596 8;
  • 89) 0.260 590 403 246 422 529 723 596 8 × 2 = 0 + 0.521 180 806 492 845 059 447 193 6;
  • 90) 0.521 180 806 492 845 059 447 193 6 × 2 = 1 + 0.042 361 612 985 690 118 894 387 2;
  • 91) 0.042 361 612 985 690 118 894 387 2 × 2 = 0 + 0.084 723 225 971 380 237 788 774 4;
  • 92) 0.084 723 225 971 380 237 788 774 4 × 2 = 0 + 0.169 446 451 942 760 475 577 548 8;
  • 93) 0.169 446 451 942 760 475 577 548 8 × 2 = 0 + 0.338 892 903 885 520 951 155 097 6;
  • 94) 0.338 892 903 885 520 951 155 097 6 × 2 = 0 + 0.677 785 807 771 041 902 310 195 2;
  • 95) 0.677 785 807 771 041 902 310 195 2 × 2 = 1 + 0.355 571 615 542 083 804 620 390 4;
  • 96) 0.355 571 615 542 083 804 620 390 4 × 2 = 0 + 0.711 143 231 084 167 609 240 780 8;
  • 97) 0.711 143 231 084 167 609 240 780 8 × 2 = 1 + 0.422 286 462 168 335 218 481 561 6;
  • 98) 0.422 286 462 168 335 218 481 561 6 × 2 = 0 + 0.844 572 924 336 670 436 963 123 2;
  • 99) 0.844 572 924 336 670 436 963 123 2 × 2 = 1 + 0.689 145 848 673 340 873 926 246 4;
  • 100) 0.689 145 848 673 340 873 926 246 4 × 2 = 1 + 0.378 291 697 346 681 747 852 492 8;
  • 101) 0.378 291 697 346 681 747 852 492 8 × 2 = 0 + 0.756 583 394 693 363 495 704 985 6;
  • 102) 0.756 583 394 693 363 495 704 985 6 × 2 = 1 + 0.513 166 789 386 726 991 409 971 2;
  • 103) 0.513 166 789 386 726 991 409 971 2 × 2 = 1 + 0.026 333 578 773 453 982 819 942 4;
  • 104) 0.026 333 578 773 453 982 819 942 4 × 2 = 0 + 0.052 667 157 546 907 965 639 884 8;
  • 105) 0.052 667 157 546 907 965 639 884 8 × 2 = 0 + 0.105 334 315 093 815 931 279 769 6;
  • 106) 0.105 334 315 093 815 931 279 769 6 × 2 = 0 + 0.210 668 630 187 631 862 559 539 2;
  • 107) 0.210 668 630 187 631 862 559 539 2 × 2 = 0 + 0.421 337 260 375 263 725 119 078 4;
  • 108) 0.421 337 260 375 263 725 119 078 4 × 2 = 0 + 0.842 674 520 750 527 450 238 156 8;
  • 109) 0.842 674 520 750 527 450 238 156 8 × 2 = 1 + 0.685 349 041 501 054 900 476 313 6;
  • 110) 0.685 349 041 501 054 900 476 313 6 × 2 = 1 + 0.370 698 083 002 109 800 952 627 2;
  • 111) 0.370 698 083 002 109 800 952 627 2 × 2 = 0 + 0.741 396 166 004 219 601 905 254 4;
  • 112) 0.741 396 166 004 219 601 905 254 4 × 2 = 1 + 0.482 792 332 008 439 203 810 508 8;
  • 113) 0.482 792 332 008 439 203 810 508 8 × 2 = 0 + 0.965 584 664 016 878 407 621 017 6;
  • 114) 0.965 584 664 016 878 407 621 017 6 × 2 = 1 + 0.931 169 328 033 756 815 242 035 2;
  • 115) 0.931 169 328 033 756 815 242 035 2 × 2 = 1 + 0.862 338 656 067 513 630 484 070 4;
  • 116) 0.862 338 656 067 513 630 484 070 4 × 2 = 1 + 0.724 677 312 135 027 260 968 140 8;
  • 117) 0.724 677 312 135 027 260 968 140 8 × 2 = 1 + 0.449 354 624 270 054 521 936 281 6;
  • 118) 0.449 354 624 270 054 521 936 281 6 × 2 = 0 + 0.898 709 248 540 109 043 872 563 2;
  • 119) 0.898 709 248 540 109 043 872 563 2 × 2 = 1 + 0.797 418 497 080 218 087 745 126 4;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 536 55(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 0000 1110 0100 0010 1011 0110 0000 1101 0111 101(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 536 55(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 0000 1110 0100 0010 1011 0110 0000 1101 0111 101(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 536 55(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 0000 1110 0100 0010 1011 0110 0000 1101 0111 101(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 0000 1110 0100 0010 1011 0110 0000 1101 0111 101(2) × 20 =


1.0100 0010 1000 0000 0111 0010 0001 0101 1011 0000 0110 1011 1101(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 1000 0000 0111 0010 0001 0101 1011 0000 0110 1011 1101


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 1000 0000 0111 0010 0001 0101 1011 0000 0110 1011 1101 =


0100 0010 1000 0000 0111 0010 0001 0101 1011 0000 0110 1011 1101


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 1000 0000 0111 0010 0001 0101 1011 0000 0110 1011 1101


Decimal number 0.000 000 000 000 000 000 008 536 55 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 1000 0000 0111 0010 0001 0101 1011 0000 0110 1011 1101


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100