0.000 000 000 000 000 000 008 537 47 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 537 47(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 537 47(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 537 47.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 537 47 × 2 = 0 + 0.000 000 000 000 000 000 017 074 94;
  • 2) 0.000 000 000 000 000 000 017 074 94 × 2 = 0 + 0.000 000 000 000 000 000 034 149 88;
  • 3) 0.000 000 000 000 000 000 034 149 88 × 2 = 0 + 0.000 000 000 000 000 000 068 299 76;
  • 4) 0.000 000 000 000 000 000 068 299 76 × 2 = 0 + 0.000 000 000 000 000 000 136 599 52;
  • 5) 0.000 000 000 000 000 000 136 599 52 × 2 = 0 + 0.000 000 000 000 000 000 273 199 04;
  • 6) 0.000 000 000 000 000 000 273 199 04 × 2 = 0 + 0.000 000 000 000 000 000 546 398 08;
  • 7) 0.000 000 000 000 000 000 546 398 08 × 2 = 0 + 0.000 000 000 000 000 001 092 796 16;
  • 8) 0.000 000 000 000 000 001 092 796 16 × 2 = 0 + 0.000 000 000 000 000 002 185 592 32;
  • 9) 0.000 000 000 000 000 002 185 592 32 × 2 = 0 + 0.000 000 000 000 000 004 371 184 64;
  • 10) 0.000 000 000 000 000 004 371 184 64 × 2 = 0 + 0.000 000 000 000 000 008 742 369 28;
  • 11) 0.000 000 000 000 000 008 742 369 28 × 2 = 0 + 0.000 000 000 000 000 017 484 738 56;
  • 12) 0.000 000 000 000 000 017 484 738 56 × 2 = 0 + 0.000 000 000 000 000 034 969 477 12;
  • 13) 0.000 000 000 000 000 034 969 477 12 × 2 = 0 + 0.000 000 000 000 000 069 938 954 24;
  • 14) 0.000 000 000 000 000 069 938 954 24 × 2 = 0 + 0.000 000 000 000 000 139 877 908 48;
  • 15) 0.000 000 000 000 000 139 877 908 48 × 2 = 0 + 0.000 000 000 000 000 279 755 816 96;
  • 16) 0.000 000 000 000 000 279 755 816 96 × 2 = 0 + 0.000 000 000 000 000 559 511 633 92;
  • 17) 0.000 000 000 000 000 559 511 633 92 × 2 = 0 + 0.000 000 000 000 001 119 023 267 84;
  • 18) 0.000 000 000 000 001 119 023 267 84 × 2 = 0 + 0.000 000 000 000 002 238 046 535 68;
  • 19) 0.000 000 000 000 002 238 046 535 68 × 2 = 0 + 0.000 000 000 000 004 476 093 071 36;
  • 20) 0.000 000 000 000 004 476 093 071 36 × 2 = 0 + 0.000 000 000 000 008 952 186 142 72;
  • 21) 0.000 000 000 000 008 952 186 142 72 × 2 = 0 + 0.000 000 000 000 017 904 372 285 44;
  • 22) 0.000 000 000 000 017 904 372 285 44 × 2 = 0 + 0.000 000 000 000 035 808 744 570 88;
  • 23) 0.000 000 000 000 035 808 744 570 88 × 2 = 0 + 0.000 000 000 000 071 617 489 141 76;
  • 24) 0.000 000 000 000 071 617 489 141 76 × 2 = 0 + 0.000 000 000 000 143 234 978 283 52;
  • 25) 0.000 000 000 000 143 234 978 283 52 × 2 = 0 + 0.000 000 000 000 286 469 956 567 04;
  • 26) 0.000 000 000 000 286 469 956 567 04 × 2 = 0 + 0.000 000 000 000 572 939 913 134 08;
  • 27) 0.000 000 000 000 572 939 913 134 08 × 2 = 0 + 0.000 000 000 001 145 879 826 268 16;
  • 28) 0.000 000 000 001 145 879 826 268 16 × 2 = 0 + 0.000 000 000 002 291 759 652 536 32;
  • 29) 0.000 000 000 002 291 759 652 536 32 × 2 = 0 + 0.000 000 000 004 583 519 305 072 64;
  • 30) 0.000 000 000 004 583 519 305 072 64 × 2 = 0 + 0.000 000 000 009 167 038 610 145 28;
  • 31) 0.000 000 000 009 167 038 610 145 28 × 2 = 0 + 0.000 000 000 018 334 077 220 290 56;
  • 32) 0.000 000 000 018 334 077 220 290 56 × 2 = 0 + 0.000 000 000 036 668 154 440 581 12;
  • 33) 0.000 000 000 036 668 154 440 581 12 × 2 = 0 + 0.000 000 000 073 336 308 881 162 24;
  • 34) 0.000 000 000 073 336 308 881 162 24 × 2 = 0 + 0.000 000 000 146 672 617 762 324 48;
  • 35) 0.000 000 000 146 672 617 762 324 48 × 2 = 0 + 0.000 000 000 293 345 235 524 648 96;
  • 36) 0.000 000 000 293 345 235 524 648 96 × 2 = 0 + 0.000 000 000 586 690 471 049 297 92;
  • 37) 0.000 000 000 586 690 471 049 297 92 × 2 = 0 + 0.000 000 001 173 380 942 098 595 84;
  • 38) 0.000 000 001 173 380 942 098 595 84 × 2 = 0 + 0.000 000 002 346 761 884 197 191 68;
  • 39) 0.000 000 002 346 761 884 197 191 68 × 2 = 0 + 0.000 000 004 693 523 768 394 383 36;
  • 40) 0.000 000 004 693 523 768 394 383 36 × 2 = 0 + 0.000 000 009 387 047 536 788 766 72;
  • 41) 0.000 000 009 387 047 536 788 766 72 × 2 = 0 + 0.000 000 018 774 095 073 577 533 44;
  • 42) 0.000 000 018 774 095 073 577 533 44 × 2 = 0 + 0.000 000 037 548 190 147 155 066 88;
  • 43) 0.000 000 037 548 190 147 155 066 88 × 2 = 0 + 0.000 000 075 096 380 294 310 133 76;
  • 44) 0.000 000 075 096 380 294 310 133 76 × 2 = 0 + 0.000 000 150 192 760 588 620 267 52;
  • 45) 0.000 000 150 192 760 588 620 267 52 × 2 = 0 + 0.000 000 300 385 521 177 240 535 04;
  • 46) 0.000 000 300 385 521 177 240 535 04 × 2 = 0 + 0.000 000 600 771 042 354 481 070 08;
  • 47) 0.000 000 600 771 042 354 481 070 08 × 2 = 0 + 0.000 001 201 542 084 708 962 140 16;
  • 48) 0.000 001 201 542 084 708 962 140 16 × 2 = 0 + 0.000 002 403 084 169 417 924 280 32;
  • 49) 0.000 002 403 084 169 417 924 280 32 × 2 = 0 + 0.000 004 806 168 338 835 848 560 64;
  • 50) 0.000 004 806 168 338 835 848 560 64 × 2 = 0 + 0.000 009 612 336 677 671 697 121 28;
  • 51) 0.000 009 612 336 677 671 697 121 28 × 2 = 0 + 0.000 019 224 673 355 343 394 242 56;
  • 52) 0.000 019 224 673 355 343 394 242 56 × 2 = 0 + 0.000 038 449 346 710 686 788 485 12;
  • 53) 0.000 038 449 346 710 686 788 485 12 × 2 = 0 + 0.000 076 898 693 421 373 576 970 24;
  • 54) 0.000 076 898 693 421 373 576 970 24 × 2 = 0 + 0.000 153 797 386 842 747 153 940 48;
  • 55) 0.000 153 797 386 842 747 153 940 48 × 2 = 0 + 0.000 307 594 773 685 494 307 880 96;
  • 56) 0.000 307 594 773 685 494 307 880 96 × 2 = 0 + 0.000 615 189 547 370 988 615 761 92;
  • 57) 0.000 615 189 547 370 988 615 761 92 × 2 = 0 + 0.001 230 379 094 741 977 231 523 84;
  • 58) 0.001 230 379 094 741 977 231 523 84 × 2 = 0 + 0.002 460 758 189 483 954 463 047 68;
  • 59) 0.002 460 758 189 483 954 463 047 68 × 2 = 0 + 0.004 921 516 378 967 908 926 095 36;
  • 60) 0.004 921 516 378 967 908 926 095 36 × 2 = 0 + 0.009 843 032 757 935 817 852 190 72;
  • 61) 0.009 843 032 757 935 817 852 190 72 × 2 = 0 + 0.019 686 065 515 871 635 704 381 44;
  • 62) 0.019 686 065 515 871 635 704 381 44 × 2 = 0 + 0.039 372 131 031 743 271 408 762 88;
  • 63) 0.039 372 131 031 743 271 408 762 88 × 2 = 0 + 0.078 744 262 063 486 542 817 525 76;
  • 64) 0.078 744 262 063 486 542 817 525 76 × 2 = 0 + 0.157 488 524 126 973 085 635 051 52;
  • 65) 0.157 488 524 126 973 085 635 051 52 × 2 = 0 + 0.314 977 048 253 946 171 270 103 04;
  • 66) 0.314 977 048 253 946 171 270 103 04 × 2 = 0 + 0.629 954 096 507 892 342 540 206 08;
  • 67) 0.629 954 096 507 892 342 540 206 08 × 2 = 1 + 0.259 908 193 015 784 685 080 412 16;
  • 68) 0.259 908 193 015 784 685 080 412 16 × 2 = 0 + 0.519 816 386 031 569 370 160 824 32;
  • 69) 0.519 816 386 031 569 370 160 824 32 × 2 = 1 + 0.039 632 772 063 138 740 321 648 64;
  • 70) 0.039 632 772 063 138 740 321 648 64 × 2 = 0 + 0.079 265 544 126 277 480 643 297 28;
  • 71) 0.079 265 544 126 277 480 643 297 28 × 2 = 0 + 0.158 531 088 252 554 961 286 594 56;
  • 72) 0.158 531 088 252 554 961 286 594 56 × 2 = 0 + 0.317 062 176 505 109 922 573 189 12;
  • 73) 0.317 062 176 505 109 922 573 189 12 × 2 = 0 + 0.634 124 353 010 219 845 146 378 24;
  • 74) 0.634 124 353 010 219 845 146 378 24 × 2 = 1 + 0.268 248 706 020 439 690 292 756 48;
  • 75) 0.268 248 706 020 439 690 292 756 48 × 2 = 0 + 0.536 497 412 040 879 380 585 512 96;
  • 76) 0.536 497 412 040 879 380 585 512 96 × 2 = 1 + 0.072 994 824 081 758 761 171 025 92;
  • 77) 0.072 994 824 081 758 761 171 025 92 × 2 = 0 + 0.145 989 648 163 517 522 342 051 84;
  • 78) 0.145 989 648 163 517 522 342 051 84 × 2 = 0 + 0.291 979 296 327 035 044 684 103 68;
  • 79) 0.291 979 296 327 035 044 684 103 68 × 2 = 0 + 0.583 958 592 654 070 089 368 207 36;
  • 80) 0.583 958 592 654 070 089 368 207 36 × 2 = 1 + 0.167 917 185 308 140 178 736 414 72;
  • 81) 0.167 917 185 308 140 178 736 414 72 × 2 = 0 + 0.335 834 370 616 280 357 472 829 44;
  • 82) 0.335 834 370 616 280 357 472 829 44 × 2 = 0 + 0.671 668 741 232 560 714 945 658 88;
  • 83) 0.671 668 741 232 560 714 945 658 88 × 2 = 1 + 0.343 337 482 465 121 429 891 317 76;
  • 84) 0.343 337 482 465 121 429 891 317 76 × 2 = 0 + 0.686 674 964 930 242 859 782 635 52;
  • 85) 0.686 674 964 930 242 859 782 635 52 × 2 = 1 + 0.373 349 929 860 485 719 565 271 04;
  • 86) 0.373 349 929 860 485 719 565 271 04 × 2 = 0 + 0.746 699 859 720 971 439 130 542 08;
  • 87) 0.746 699 859 720 971 439 130 542 08 × 2 = 1 + 0.493 399 719 441 942 878 261 084 16;
  • 88) 0.493 399 719 441 942 878 261 084 16 × 2 = 0 + 0.986 799 438 883 885 756 522 168 32;
  • 89) 0.986 799 438 883 885 756 522 168 32 × 2 = 1 + 0.973 598 877 767 771 513 044 336 64;
  • 90) 0.973 598 877 767 771 513 044 336 64 × 2 = 1 + 0.947 197 755 535 543 026 088 673 28;
  • 91) 0.947 197 755 535 543 026 088 673 28 × 2 = 1 + 0.894 395 511 071 086 052 177 346 56;
  • 92) 0.894 395 511 071 086 052 177 346 56 × 2 = 1 + 0.788 791 022 142 172 104 354 693 12;
  • 93) 0.788 791 022 142 172 104 354 693 12 × 2 = 1 + 0.577 582 044 284 344 208 709 386 24;
  • 94) 0.577 582 044 284 344 208 709 386 24 × 2 = 1 + 0.155 164 088 568 688 417 418 772 48;
  • 95) 0.155 164 088 568 688 417 418 772 48 × 2 = 0 + 0.310 328 177 137 376 834 837 544 96;
  • 96) 0.310 328 177 137 376 834 837 544 96 × 2 = 0 + 0.620 656 354 274 753 669 675 089 92;
  • 97) 0.620 656 354 274 753 669 675 089 92 × 2 = 1 + 0.241 312 708 549 507 339 350 179 84;
  • 98) 0.241 312 708 549 507 339 350 179 84 × 2 = 0 + 0.482 625 417 099 014 678 700 359 68;
  • 99) 0.482 625 417 099 014 678 700 359 68 × 2 = 0 + 0.965 250 834 198 029 357 400 719 36;
  • 100) 0.965 250 834 198 029 357 400 719 36 × 2 = 1 + 0.930 501 668 396 058 714 801 438 72;
  • 101) 0.930 501 668 396 058 714 801 438 72 × 2 = 1 + 0.861 003 336 792 117 429 602 877 44;
  • 102) 0.861 003 336 792 117 429 602 877 44 × 2 = 1 + 0.722 006 673 584 234 859 205 754 88;
  • 103) 0.722 006 673 584 234 859 205 754 88 × 2 = 1 + 0.444 013 347 168 469 718 411 509 76;
  • 104) 0.444 013 347 168 469 718 411 509 76 × 2 = 0 + 0.888 026 694 336 939 436 823 019 52;
  • 105) 0.888 026 694 336 939 436 823 019 52 × 2 = 1 + 0.776 053 388 673 878 873 646 039 04;
  • 106) 0.776 053 388 673 878 873 646 039 04 × 2 = 1 + 0.552 106 777 347 757 747 292 078 08;
  • 107) 0.552 106 777 347 757 747 292 078 08 × 2 = 1 + 0.104 213 554 695 515 494 584 156 16;
  • 108) 0.104 213 554 695 515 494 584 156 16 × 2 = 0 + 0.208 427 109 391 030 989 168 312 32;
  • 109) 0.208 427 109 391 030 989 168 312 32 × 2 = 0 + 0.416 854 218 782 061 978 336 624 64;
  • 110) 0.416 854 218 782 061 978 336 624 64 × 2 = 0 + 0.833 708 437 564 123 956 673 249 28;
  • 111) 0.833 708 437 564 123 956 673 249 28 × 2 = 1 + 0.667 416 875 128 247 913 346 498 56;
  • 112) 0.667 416 875 128 247 913 346 498 56 × 2 = 1 + 0.334 833 750 256 495 826 692 997 12;
  • 113) 0.334 833 750 256 495 826 692 997 12 × 2 = 0 + 0.669 667 500 512 991 653 385 994 24;
  • 114) 0.669 667 500 512 991 653 385 994 24 × 2 = 1 + 0.339 335 001 025 983 306 771 988 48;
  • 115) 0.339 335 001 025 983 306 771 988 48 × 2 = 0 + 0.678 670 002 051 966 613 543 976 96;
  • 116) 0.678 670 002 051 966 613 543 976 96 × 2 = 1 + 0.357 340 004 103 933 227 087 953 92;
  • 117) 0.357 340 004 103 933 227 087 953 92 × 2 = 0 + 0.714 680 008 207 866 454 175 907 84;
  • 118) 0.714 680 008 207 866 454 175 907 84 × 2 = 1 + 0.429 360 016 415 732 908 351 815 68;
  • 119) 0.429 360 016 415 732 908 351 815 68 × 2 = 0 + 0.858 720 032 831 465 816 703 631 36;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 537 47(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0001 0010 1010 1111 1100 1001 1110 1110 0011 0101 010(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 537 47(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0001 0010 1010 1111 1100 1001 1110 1110 0011 0101 010(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 537 47(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0001 0010 1010 1111 1100 1001 1110 1110 0011 0101 010(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0001 0010 1010 1111 1100 1001 1110 1110 0011 0101 010(2) × 20 =


1.0100 0010 1000 1001 0101 0111 1110 0100 1111 0111 0001 1010 1010(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 1000 1001 0101 0111 1110 0100 1111 0111 0001 1010 1010


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 1000 1001 0101 0111 1110 0100 1111 0111 0001 1010 1010 =


0100 0010 1000 1001 0101 0111 1110 0100 1111 0111 0001 1010 1010


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 1000 1001 0101 0111 1110 0100 1111 0111 0001 1010 1010


Decimal number 0.000 000 000 000 000 000 008 537 47 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 1000 1001 0101 0111 1110 0100 1111 0111 0001 1010 1010


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100