0.000 000 000 000 000 000 008 537 33 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 537 33(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 537 33(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 537 33.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 537 33 × 2 = 0 + 0.000 000 000 000 000 000 017 074 66;
  • 2) 0.000 000 000 000 000 000 017 074 66 × 2 = 0 + 0.000 000 000 000 000 000 034 149 32;
  • 3) 0.000 000 000 000 000 000 034 149 32 × 2 = 0 + 0.000 000 000 000 000 000 068 298 64;
  • 4) 0.000 000 000 000 000 000 068 298 64 × 2 = 0 + 0.000 000 000 000 000 000 136 597 28;
  • 5) 0.000 000 000 000 000 000 136 597 28 × 2 = 0 + 0.000 000 000 000 000 000 273 194 56;
  • 6) 0.000 000 000 000 000 000 273 194 56 × 2 = 0 + 0.000 000 000 000 000 000 546 389 12;
  • 7) 0.000 000 000 000 000 000 546 389 12 × 2 = 0 + 0.000 000 000 000 000 001 092 778 24;
  • 8) 0.000 000 000 000 000 001 092 778 24 × 2 = 0 + 0.000 000 000 000 000 002 185 556 48;
  • 9) 0.000 000 000 000 000 002 185 556 48 × 2 = 0 + 0.000 000 000 000 000 004 371 112 96;
  • 10) 0.000 000 000 000 000 004 371 112 96 × 2 = 0 + 0.000 000 000 000 000 008 742 225 92;
  • 11) 0.000 000 000 000 000 008 742 225 92 × 2 = 0 + 0.000 000 000 000 000 017 484 451 84;
  • 12) 0.000 000 000 000 000 017 484 451 84 × 2 = 0 + 0.000 000 000 000 000 034 968 903 68;
  • 13) 0.000 000 000 000 000 034 968 903 68 × 2 = 0 + 0.000 000 000 000 000 069 937 807 36;
  • 14) 0.000 000 000 000 000 069 937 807 36 × 2 = 0 + 0.000 000 000 000 000 139 875 614 72;
  • 15) 0.000 000 000 000 000 139 875 614 72 × 2 = 0 + 0.000 000 000 000 000 279 751 229 44;
  • 16) 0.000 000 000 000 000 279 751 229 44 × 2 = 0 + 0.000 000 000 000 000 559 502 458 88;
  • 17) 0.000 000 000 000 000 559 502 458 88 × 2 = 0 + 0.000 000 000 000 001 119 004 917 76;
  • 18) 0.000 000 000 000 001 119 004 917 76 × 2 = 0 + 0.000 000 000 000 002 238 009 835 52;
  • 19) 0.000 000 000 000 002 238 009 835 52 × 2 = 0 + 0.000 000 000 000 004 476 019 671 04;
  • 20) 0.000 000 000 000 004 476 019 671 04 × 2 = 0 + 0.000 000 000 000 008 952 039 342 08;
  • 21) 0.000 000 000 000 008 952 039 342 08 × 2 = 0 + 0.000 000 000 000 017 904 078 684 16;
  • 22) 0.000 000 000 000 017 904 078 684 16 × 2 = 0 + 0.000 000 000 000 035 808 157 368 32;
  • 23) 0.000 000 000 000 035 808 157 368 32 × 2 = 0 + 0.000 000 000 000 071 616 314 736 64;
  • 24) 0.000 000 000 000 071 616 314 736 64 × 2 = 0 + 0.000 000 000 000 143 232 629 473 28;
  • 25) 0.000 000 000 000 143 232 629 473 28 × 2 = 0 + 0.000 000 000 000 286 465 258 946 56;
  • 26) 0.000 000 000 000 286 465 258 946 56 × 2 = 0 + 0.000 000 000 000 572 930 517 893 12;
  • 27) 0.000 000 000 000 572 930 517 893 12 × 2 = 0 + 0.000 000 000 001 145 861 035 786 24;
  • 28) 0.000 000 000 001 145 861 035 786 24 × 2 = 0 + 0.000 000 000 002 291 722 071 572 48;
  • 29) 0.000 000 000 002 291 722 071 572 48 × 2 = 0 + 0.000 000 000 004 583 444 143 144 96;
  • 30) 0.000 000 000 004 583 444 143 144 96 × 2 = 0 + 0.000 000 000 009 166 888 286 289 92;
  • 31) 0.000 000 000 009 166 888 286 289 92 × 2 = 0 + 0.000 000 000 018 333 776 572 579 84;
  • 32) 0.000 000 000 018 333 776 572 579 84 × 2 = 0 + 0.000 000 000 036 667 553 145 159 68;
  • 33) 0.000 000 000 036 667 553 145 159 68 × 2 = 0 + 0.000 000 000 073 335 106 290 319 36;
  • 34) 0.000 000 000 073 335 106 290 319 36 × 2 = 0 + 0.000 000 000 146 670 212 580 638 72;
  • 35) 0.000 000 000 146 670 212 580 638 72 × 2 = 0 + 0.000 000 000 293 340 425 161 277 44;
  • 36) 0.000 000 000 293 340 425 161 277 44 × 2 = 0 + 0.000 000 000 586 680 850 322 554 88;
  • 37) 0.000 000 000 586 680 850 322 554 88 × 2 = 0 + 0.000 000 001 173 361 700 645 109 76;
  • 38) 0.000 000 001 173 361 700 645 109 76 × 2 = 0 + 0.000 000 002 346 723 401 290 219 52;
  • 39) 0.000 000 002 346 723 401 290 219 52 × 2 = 0 + 0.000 000 004 693 446 802 580 439 04;
  • 40) 0.000 000 004 693 446 802 580 439 04 × 2 = 0 + 0.000 000 009 386 893 605 160 878 08;
  • 41) 0.000 000 009 386 893 605 160 878 08 × 2 = 0 + 0.000 000 018 773 787 210 321 756 16;
  • 42) 0.000 000 018 773 787 210 321 756 16 × 2 = 0 + 0.000 000 037 547 574 420 643 512 32;
  • 43) 0.000 000 037 547 574 420 643 512 32 × 2 = 0 + 0.000 000 075 095 148 841 287 024 64;
  • 44) 0.000 000 075 095 148 841 287 024 64 × 2 = 0 + 0.000 000 150 190 297 682 574 049 28;
  • 45) 0.000 000 150 190 297 682 574 049 28 × 2 = 0 + 0.000 000 300 380 595 365 148 098 56;
  • 46) 0.000 000 300 380 595 365 148 098 56 × 2 = 0 + 0.000 000 600 761 190 730 296 197 12;
  • 47) 0.000 000 600 761 190 730 296 197 12 × 2 = 0 + 0.000 001 201 522 381 460 592 394 24;
  • 48) 0.000 001 201 522 381 460 592 394 24 × 2 = 0 + 0.000 002 403 044 762 921 184 788 48;
  • 49) 0.000 002 403 044 762 921 184 788 48 × 2 = 0 + 0.000 004 806 089 525 842 369 576 96;
  • 50) 0.000 004 806 089 525 842 369 576 96 × 2 = 0 + 0.000 009 612 179 051 684 739 153 92;
  • 51) 0.000 009 612 179 051 684 739 153 92 × 2 = 0 + 0.000 019 224 358 103 369 478 307 84;
  • 52) 0.000 019 224 358 103 369 478 307 84 × 2 = 0 + 0.000 038 448 716 206 738 956 615 68;
  • 53) 0.000 038 448 716 206 738 956 615 68 × 2 = 0 + 0.000 076 897 432 413 477 913 231 36;
  • 54) 0.000 076 897 432 413 477 913 231 36 × 2 = 0 + 0.000 153 794 864 826 955 826 462 72;
  • 55) 0.000 153 794 864 826 955 826 462 72 × 2 = 0 + 0.000 307 589 729 653 911 652 925 44;
  • 56) 0.000 307 589 729 653 911 652 925 44 × 2 = 0 + 0.000 615 179 459 307 823 305 850 88;
  • 57) 0.000 615 179 459 307 823 305 850 88 × 2 = 0 + 0.001 230 358 918 615 646 611 701 76;
  • 58) 0.001 230 358 918 615 646 611 701 76 × 2 = 0 + 0.002 460 717 837 231 293 223 403 52;
  • 59) 0.002 460 717 837 231 293 223 403 52 × 2 = 0 + 0.004 921 435 674 462 586 446 807 04;
  • 60) 0.004 921 435 674 462 586 446 807 04 × 2 = 0 + 0.009 842 871 348 925 172 893 614 08;
  • 61) 0.009 842 871 348 925 172 893 614 08 × 2 = 0 + 0.019 685 742 697 850 345 787 228 16;
  • 62) 0.019 685 742 697 850 345 787 228 16 × 2 = 0 + 0.039 371 485 395 700 691 574 456 32;
  • 63) 0.039 371 485 395 700 691 574 456 32 × 2 = 0 + 0.078 742 970 791 401 383 148 912 64;
  • 64) 0.078 742 970 791 401 383 148 912 64 × 2 = 0 + 0.157 485 941 582 802 766 297 825 28;
  • 65) 0.157 485 941 582 802 766 297 825 28 × 2 = 0 + 0.314 971 883 165 605 532 595 650 56;
  • 66) 0.314 971 883 165 605 532 595 650 56 × 2 = 0 + 0.629 943 766 331 211 065 191 301 12;
  • 67) 0.629 943 766 331 211 065 191 301 12 × 2 = 1 + 0.259 887 532 662 422 130 382 602 24;
  • 68) 0.259 887 532 662 422 130 382 602 24 × 2 = 0 + 0.519 775 065 324 844 260 765 204 48;
  • 69) 0.519 775 065 324 844 260 765 204 48 × 2 = 1 + 0.039 550 130 649 688 521 530 408 96;
  • 70) 0.039 550 130 649 688 521 530 408 96 × 2 = 0 + 0.079 100 261 299 377 043 060 817 92;
  • 71) 0.079 100 261 299 377 043 060 817 92 × 2 = 0 + 0.158 200 522 598 754 086 121 635 84;
  • 72) 0.158 200 522 598 754 086 121 635 84 × 2 = 0 + 0.316 401 045 197 508 172 243 271 68;
  • 73) 0.316 401 045 197 508 172 243 271 68 × 2 = 0 + 0.632 802 090 395 016 344 486 543 36;
  • 74) 0.632 802 090 395 016 344 486 543 36 × 2 = 1 + 0.265 604 180 790 032 688 973 086 72;
  • 75) 0.265 604 180 790 032 688 973 086 72 × 2 = 0 + 0.531 208 361 580 065 377 946 173 44;
  • 76) 0.531 208 361 580 065 377 946 173 44 × 2 = 1 + 0.062 416 723 160 130 755 892 346 88;
  • 77) 0.062 416 723 160 130 755 892 346 88 × 2 = 0 + 0.124 833 446 320 261 511 784 693 76;
  • 78) 0.124 833 446 320 261 511 784 693 76 × 2 = 0 + 0.249 666 892 640 523 023 569 387 52;
  • 79) 0.249 666 892 640 523 023 569 387 52 × 2 = 0 + 0.499 333 785 281 046 047 138 775 04;
  • 80) 0.499 333 785 281 046 047 138 775 04 × 2 = 0 + 0.998 667 570 562 092 094 277 550 08;
  • 81) 0.998 667 570 562 092 094 277 550 08 × 2 = 1 + 0.997 335 141 124 184 188 555 100 16;
  • 82) 0.997 335 141 124 184 188 555 100 16 × 2 = 1 + 0.994 670 282 248 368 377 110 200 32;
  • 83) 0.994 670 282 248 368 377 110 200 32 × 2 = 1 + 0.989 340 564 496 736 754 220 400 64;
  • 84) 0.989 340 564 496 736 754 220 400 64 × 2 = 1 + 0.978 681 128 993 473 508 440 801 28;
  • 85) 0.978 681 128 993 473 508 440 801 28 × 2 = 1 + 0.957 362 257 986 947 016 881 602 56;
  • 86) 0.957 362 257 986 947 016 881 602 56 × 2 = 1 + 0.914 724 515 973 894 033 763 205 12;
  • 87) 0.914 724 515 973 894 033 763 205 12 × 2 = 1 + 0.829 449 031 947 788 067 526 410 24;
  • 88) 0.829 449 031 947 788 067 526 410 24 × 2 = 1 + 0.658 898 063 895 576 135 052 820 48;
  • 89) 0.658 898 063 895 576 135 052 820 48 × 2 = 1 + 0.317 796 127 791 152 270 105 640 96;
  • 90) 0.317 796 127 791 152 270 105 640 96 × 2 = 0 + 0.635 592 255 582 304 540 211 281 92;
  • 91) 0.635 592 255 582 304 540 211 281 92 × 2 = 1 + 0.271 184 511 164 609 080 422 563 84;
  • 92) 0.271 184 511 164 609 080 422 563 84 × 2 = 0 + 0.542 369 022 329 218 160 845 127 68;
  • 93) 0.542 369 022 329 218 160 845 127 68 × 2 = 1 + 0.084 738 044 658 436 321 690 255 36;
  • 94) 0.084 738 044 658 436 321 690 255 36 × 2 = 0 + 0.169 476 089 316 872 643 380 510 72;
  • 95) 0.169 476 089 316 872 643 380 510 72 × 2 = 0 + 0.338 952 178 633 745 286 761 021 44;
  • 96) 0.338 952 178 633 745 286 761 021 44 × 2 = 0 + 0.677 904 357 267 490 573 522 042 88;
  • 97) 0.677 904 357 267 490 573 522 042 88 × 2 = 1 + 0.355 808 714 534 981 147 044 085 76;
  • 98) 0.355 808 714 534 981 147 044 085 76 × 2 = 0 + 0.711 617 429 069 962 294 088 171 52;
  • 99) 0.711 617 429 069 962 294 088 171 52 × 2 = 1 + 0.423 234 858 139 924 588 176 343 04;
  • 100) 0.423 234 858 139 924 588 176 343 04 × 2 = 0 + 0.846 469 716 279 849 176 352 686 08;
  • 101) 0.846 469 716 279 849 176 352 686 08 × 2 = 1 + 0.692 939 432 559 698 352 705 372 16;
  • 102) 0.692 939 432 559 698 352 705 372 16 × 2 = 1 + 0.385 878 865 119 396 705 410 744 32;
  • 103) 0.385 878 865 119 396 705 410 744 32 × 2 = 0 + 0.771 757 730 238 793 410 821 488 64;
  • 104) 0.771 757 730 238 793 410 821 488 64 × 2 = 1 + 0.543 515 460 477 586 821 642 977 28;
  • 105) 0.543 515 460 477 586 821 642 977 28 × 2 = 1 + 0.087 030 920 955 173 643 285 954 56;
  • 106) 0.087 030 920 955 173 643 285 954 56 × 2 = 0 + 0.174 061 841 910 347 286 571 909 12;
  • 107) 0.174 061 841 910 347 286 571 909 12 × 2 = 0 + 0.348 123 683 820 694 573 143 818 24;
  • 108) 0.348 123 683 820 694 573 143 818 24 × 2 = 0 + 0.696 247 367 641 389 146 287 636 48;
  • 109) 0.696 247 367 641 389 146 287 636 48 × 2 = 1 + 0.392 494 735 282 778 292 575 272 96;
  • 110) 0.392 494 735 282 778 292 575 272 96 × 2 = 0 + 0.784 989 470 565 556 585 150 545 92;
  • 111) 0.784 989 470 565 556 585 150 545 92 × 2 = 1 + 0.569 978 941 131 113 170 301 091 84;
  • 112) 0.569 978 941 131 113 170 301 091 84 × 2 = 1 + 0.139 957 882 262 226 340 602 183 68;
  • 113) 0.139 957 882 262 226 340 602 183 68 × 2 = 0 + 0.279 915 764 524 452 681 204 367 36;
  • 114) 0.279 915 764 524 452 681 204 367 36 × 2 = 0 + 0.559 831 529 048 905 362 408 734 72;
  • 115) 0.559 831 529 048 905 362 408 734 72 × 2 = 1 + 0.119 663 058 097 810 724 817 469 44;
  • 116) 0.119 663 058 097 810 724 817 469 44 × 2 = 0 + 0.239 326 116 195 621 449 634 938 88;
  • 117) 0.239 326 116 195 621 449 634 938 88 × 2 = 0 + 0.478 652 232 391 242 899 269 877 76;
  • 118) 0.478 652 232 391 242 899 269 877 76 × 2 = 0 + 0.957 304 464 782 485 798 539 755 52;
  • 119) 0.957 304 464 782 485 798 539 755 52 × 2 = 1 + 0.914 608 929 564 971 597 079 511 04;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 537 33(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1111 1111 1010 1000 1010 1101 1000 1011 0010 001(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 537 33(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1111 1111 1010 1000 1010 1101 1000 1011 0010 001(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 537 33(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1111 1111 1010 1000 1010 1101 1000 1011 0010 001(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1111 1111 1010 1000 1010 1101 1000 1011 0010 001(2) × 20 =


1.0100 0010 1000 0111 1111 1101 0100 0101 0110 1100 0101 1001 0001(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 1000 0111 1111 1101 0100 0101 0110 1100 0101 1001 0001


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 1000 0111 1111 1101 0100 0101 0110 1100 0101 1001 0001 =


0100 0010 1000 0111 1111 1101 0100 0101 0110 1100 0101 1001 0001


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 1000 0111 1111 1101 0100 0101 0110 1100 0101 1001 0001


Decimal number 0.000 000 000 000 000 000 008 537 33 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 1000 0111 1111 1101 0100 0101 0110 1100 0101 1001 0001


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100