0.000 000 000 000 000 000 008 537 19 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 537 19(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 537 19(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 537 19.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 537 19 × 2 = 0 + 0.000 000 000 000 000 000 017 074 38;
  • 2) 0.000 000 000 000 000 000 017 074 38 × 2 = 0 + 0.000 000 000 000 000 000 034 148 76;
  • 3) 0.000 000 000 000 000 000 034 148 76 × 2 = 0 + 0.000 000 000 000 000 000 068 297 52;
  • 4) 0.000 000 000 000 000 000 068 297 52 × 2 = 0 + 0.000 000 000 000 000 000 136 595 04;
  • 5) 0.000 000 000 000 000 000 136 595 04 × 2 = 0 + 0.000 000 000 000 000 000 273 190 08;
  • 6) 0.000 000 000 000 000 000 273 190 08 × 2 = 0 + 0.000 000 000 000 000 000 546 380 16;
  • 7) 0.000 000 000 000 000 000 546 380 16 × 2 = 0 + 0.000 000 000 000 000 001 092 760 32;
  • 8) 0.000 000 000 000 000 001 092 760 32 × 2 = 0 + 0.000 000 000 000 000 002 185 520 64;
  • 9) 0.000 000 000 000 000 002 185 520 64 × 2 = 0 + 0.000 000 000 000 000 004 371 041 28;
  • 10) 0.000 000 000 000 000 004 371 041 28 × 2 = 0 + 0.000 000 000 000 000 008 742 082 56;
  • 11) 0.000 000 000 000 000 008 742 082 56 × 2 = 0 + 0.000 000 000 000 000 017 484 165 12;
  • 12) 0.000 000 000 000 000 017 484 165 12 × 2 = 0 + 0.000 000 000 000 000 034 968 330 24;
  • 13) 0.000 000 000 000 000 034 968 330 24 × 2 = 0 + 0.000 000 000 000 000 069 936 660 48;
  • 14) 0.000 000 000 000 000 069 936 660 48 × 2 = 0 + 0.000 000 000 000 000 139 873 320 96;
  • 15) 0.000 000 000 000 000 139 873 320 96 × 2 = 0 + 0.000 000 000 000 000 279 746 641 92;
  • 16) 0.000 000 000 000 000 279 746 641 92 × 2 = 0 + 0.000 000 000 000 000 559 493 283 84;
  • 17) 0.000 000 000 000 000 559 493 283 84 × 2 = 0 + 0.000 000 000 000 001 118 986 567 68;
  • 18) 0.000 000 000 000 001 118 986 567 68 × 2 = 0 + 0.000 000 000 000 002 237 973 135 36;
  • 19) 0.000 000 000 000 002 237 973 135 36 × 2 = 0 + 0.000 000 000 000 004 475 946 270 72;
  • 20) 0.000 000 000 000 004 475 946 270 72 × 2 = 0 + 0.000 000 000 000 008 951 892 541 44;
  • 21) 0.000 000 000 000 008 951 892 541 44 × 2 = 0 + 0.000 000 000 000 017 903 785 082 88;
  • 22) 0.000 000 000 000 017 903 785 082 88 × 2 = 0 + 0.000 000 000 000 035 807 570 165 76;
  • 23) 0.000 000 000 000 035 807 570 165 76 × 2 = 0 + 0.000 000 000 000 071 615 140 331 52;
  • 24) 0.000 000 000 000 071 615 140 331 52 × 2 = 0 + 0.000 000 000 000 143 230 280 663 04;
  • 25) 0.000 000 000 000 143 230 280 663 04 × 2 = 0 + 0.000 000 000 000 286 460 561 326 08;
  • 26) 0.000 000 000 000 286 460 561 326 08 × 2 = 0 + 0.000 000 000 000 572 921 122 652 16;
  • 27) 0.000 000 000 000 572 921 122 652 16 × 2 = 0 + 0.000 000 000 001 145 842 245 304 32;
  • 28) 0.000 000 000 001 145 842 245 304 32 × 2 = 0 + 0.000 000 000 002 291 684 490 608 64;
  • 29) 0.000 000 000 002 291 684 490 608 64 × 2 = 0 + 0.000 000 000 004 583 368 981 217 28;
  • 30) 0.000 000 000 004 583 368 981 217 28 × 2 = 0 + 0.000 000 000 009 166 737 962 434 56;
  • 31) 0.000 000 000 009 166 737 962 434 56 × 2 = 0 + 0.000 000 000 018 333 475 924 869 12;
  • 32) 0.000 000 000 018 333 475 924 869 12 × 2 = 0 + 0.000 000 000 036 666 951 849 738 24;
  • 33) 0.000 000 000 036 666 951 849 738 24 × 2 = 0 + 0.000 000 000 073 333 903 699 476 48;
  • 34) 0.000 000 000 073 333 903 699 476 48 × 2 = 0 + 0.000 000 000 146 667 807 398 952 96;
  • 35) 0.000 000 000 146 667 807 398 952 96 × 2 = 0 + 0.000 000 000 293 335 614 797 905 92;
  • 36) 0.000 000 000 293 335 614 797 905 92 × 2 = 0 + 0.000 000 000 586 671 229 595 811 84;
  • 37) 0.000 000 000 586 671 229 595 811 84 × 2 = 0 + 0.000 000 001 173 342 459 191 623 68;
  • 38) 0.000 000 001 173 342 459 191 623 68 × 2 = 0 + 0.000 000 002 346 684 918 383 247 36;
  • 39) 0.000 000 002 346 684 918 383 247 36 × 2 = 0 + 0.000 000 004 693 369 836 766 494 72;
  • 40) 0.000 000 004 693 369 836 766 494 72 × 2 = 0 + 0.000 000 009 386 739 673 532 989 44;
  • 41) 0.000 000 009 386 739 673 532 989 44 × 2 = 0 + 0.000 000 018 773 479 347 065 978 88;
  • 42) 0.000 000 018 773 479 347 065 978 88 × 2 = 0 + 0.000 000 037 546 958 694 131 957 76;
  • 43) 0.000 000 037 546 958 694 131 957 76 × 2 = 0 + 0.000 000 075 093 917 388 263 915 52;
  • 44) 0.000 000 075 093 917 388 263 915 52 × 2 = 0 + 0.000 000 150 187 834 776 527 831 04;
  • 45) 0.000 000 150 187 834 776 527 831 04 × 2 = 0 + 0.000 000 300 375 669 553 055 662 08;
  • 46) 0.000 000 300 375 669 553 055 662 08 × 2 = 0 + 0.000 000 600 751 339 106 111 324 16;
  • 47) 0.000 000 600 751 339 106 111 324 16 × 2 = 0 + 0.000 001 201 502 678 212 222 648 32;
  • 48) 0.000 001 201 502 678 212 222 648 32 × 2 = 0 + 0.000 002 403 005 356 424 445 296 64;
  • 49) 0.000 002 403 005 356 424 445 296 64 × 2 = 0 + 0.000 004 806 010 712 848 890 593 28;
  • 50) 0.000 004 806 010 712 848 890 593 28 × 2 = 0 + 0.000 009 612 021 425 697 781 186 56;
  • 51) 0.000 009 612 021 425 697 781 186 56 × 2 = 0 + 0.000 019 224 042 851 395 562 373 12;
  • 52) 0.000 019 224 042 851 395 562 373 12 × 2 = 0 + 0.000 038 448 085 702 791 124 746 24;
  • 53) 0.000 038 448 085 702 791 124 746 24 × 2 = 0 + 0.000 076 896 171 405 582 249 492 48;
  • 54) 0.000 076 896 171 405 582 249 492 48 × 2 = 0 + 0.000 153 792 342 811 164 498 984 96;
  • 55) 0.000 153 792 342 811 164 498 984 96 × 2 = 0 + 0.000 307 584 685 622 328 997 969 92;
  • 56) 0.000 307 584 685 622 328 997 969 92 × 2 = 0 + 0.000 615 169 371 244 657 995 939 84;
  • 57) 0.000 615 169 371 244 657 995 939 84 × 2 = 0 + 0.001 230 338 742 489 315 991 879 68;
  • 58) 0.001 230 338 742 489 315 991 879 68 × 2 = 0 + 0.002 460 677 484 978 631 983 759 36;
  • 59) 0.002 460 677 484 978 631 983 759 36 × 2 = 0 + 0.004 921 354 969 957 263 967 518 72;
  • 60) 0.004 921 354 969 957 263 967 518 72 × 2 = 0 + 0.009 842 709 939 914 527 935 037 44;
  • 61) 0.009 842 709 939 914 527 935 037 44 × 2 = 0 + 0.019 685 419 879 829 055 870 074 88;
  • 62) 0.019 685 419 879 829 055 870 074 88 × 2 = 0 + 0.039 370 839 759 658 111 740 149 76;
  • 63) 0.039 370 839 759 658 111 740 149 76 × 2 = 0 + 0.078 741 679 519 316 223 480 299 52;
  • 64) 0.078 741 679 519 316 223 480 299 52 × 2 = 0 + 0.157 483 359 038 632 446 960 599 04;
  • 65) 0.157 483 359 038 632 446 960 599 04 × 2 = 0 + 0.314 966 718 077 264 893 921 198 08;
  • 66) 0.314 966 718 077 264 893 921 198 08 × 2 = 0 + 0.629 933 436 154 529 787 842 396 16;
  • 67) 0.629 933 436 154 529 787 842 396 16 × 2 = 1 + 0.259 866 872 309 059 575 684 792 32;
  • 68) 0.259 866 872 309 059 575 684 792 32 × 2 = 0 + 0.519 733 744 618 119 151 369 584 64;
  • 69) 0.519 733 744 618 119 151 369 584 64 × 2 = 1 + 0.039 467 489 236 238 302 739 169 28;
  • 70) 0.039 467 489 236 238 302 739 169 28 × 2 = 0 + 0.078 934 978 472 476 605 478 338 56;
  • 71) 0.078 934 978 472 476 605 478 338 56 × 2 = 0 + 0.157 869 956 944 953 210 956 677 12;
  • 72) 0.157 869 956 944 953 210 956 677 12 × 2 = 0 + 0.315 739 913 889 906 421 913 354 24;
  • 73) 0.315 739 913 889 906 421 913 354 24 × 2 = 0 + 0.631 479 827 779 812 843 826 708 48;
  • 74) 0.631 479 827 779 812 843 826 708 48 × 2 = 1 + 0.262 959 655 559 625 687 653 416 96;
  • 75) 0.262 959 655 559 625 687 653 416 96 × 2 = 0 + 0.525 919 311 119 251 375 306 833 92;
  • 76) 0.525 919 311 119 251 375 306 833 92 × 2 = 1 + 0.051 838 622 238 502 750 613 667 84;
  • 77) 0.051 838 622 238 502 750 613 667 84 × 2 = 0 + 0.103 677 244 477 005 501 227 335 68;
  • 78) 0.103 677 244 477 005 501 227 335 68 × 2 = 0 + 0.207 354 488 954 011 002 454 671 36;
  • 79) 0.207 354 488 954 011 002 454 671 36 × 2 = 0 + 0.414 708 977 908 022 004 909 342 72;
  • 80) 0.414 708 977 908 022 004 909 342 72 × 2 = 0 + 0.829 417 955 816 044 009 818 685 44;
  • 81) 0.829 417 955 816 044 009 818 685 44 × 2 = 1 + 0.658 835 911 632 088 019 637 370 88;
  • 82) 0.658 835 911 632 088 019 637 370 88 × 2 = 1 + 0.317 671 823 264 176 039 274 741 76;
  • 83) 0.317 671 823 264 176 039 274 741 76 × 2 = 0 + 0.635 343 646 528 352 078 549 483 52;
  • 84) 0.635 343 646 528 352 078 549 483 52 × 2 = 1 + 0.270 687 293 056 704 157 098 967 04;
  • 85) 0.270 687 293 056 704 157 098 967 04 × 2 = 0 + 0.541 374 586 113 408 314 197 934 08;
  • 86) 0.541 374 586 113 408 314 197 934 08 × 2 = 1 + 0.082 749 172 226 816 628 395 868 16;
  • 87) 0.082 749 172 226 816 628 395 868 16 × 2 = 0 + 0.165 498 344 453 633 256 791 736 32;
  • 88) 0.165 498 344 453 633 256 791 736 32 × 2 = 0 + 0.330 996 688 907 266 513 583 472 64;
  • 89) 0.330 996 688 907 266 513 583 472 64 × 2 = 0 + 0.661 993 377 814 533 027 166 945 28;
  • 90) 0.661 993 377 814 533 027 166 945 28 × 2 = 1 + 0.323 986 755 629 066 054 333 890 56;
  • 91) 0.323 986 755 629 066 054 333 890 56 × 2 = 0 + 0.647 973 511 258 132 108 667 781 12;
  • 92) 0.647 973 511 258 132 108 667 781 12 × 2 = 1 + 0.295 947 022 516 264 217 335 562 24;
  • 93) 0.295 947 022 516 264 217 335 562 24 × 2 = 0 + 0.591 894 045 032 528 434 671 124 48;
  • 94) 0.591 894 045 032 528 434 671 124 48 × 2 = 1 + 0.183 788 090 065 056 869 342 248 96;
  • 95) 0.183 788 090 065 056 869 342 248 96 × 2 = 0 + 0.367 576 180 130 113 738 684 497 92;
  • 96) 0.367 576 180 130 113 738 684 497 92 × 2 = 0 + 0.735 152 360 260 227 477 368 995 84;
  • 97) 0.735 152 360 260 227 477 368 995 84 × 2 = 1 + 0.470 304 720 520 454 954 737 991 68;
  • 98) 0.470 304 720 520 454 954 737 991 68 × 2 = 0 + 0.940 609 441 040 909 909 475 983 36;
  • 99) 0.940 609 441 040 909 909 475 983 36 × 2 = 1 + 0.881 218 882 081 819 818 951 966 72;
  • 100) 0.881 218 882 081 819 818 951 966 72 × 2 = 1 + 0.762 437 764 163 639 637 903 933 44;
  • 101) 0.762 437 764 163 639 637 903 933 44 × 2 = 1 + 0.524 875 528 327 279 275 807 866 88;
  • 102) 0.524 875 528 327 279 275 807 866 88 × 2 = 1 + 0.049 751 056 654 558 551 615 733 76;
  • 103) 0.049 751 056 654 558 551 615 733 76 × 2 = 0 + 0.099 502 113 309 117 103 231 467 52;
  • 104) 0.099 502 113 309 117 103 231 467 52 × 2 = 0 + 0.199 004 226 618 234 206 462 935 04;
  • 105) 0.199 004 226 618 234 206 462 935 04 × 2 = 0 + 0.398 008 453 236 468 412 925 870 08;
  • 106) 0.398 008 453 236 468 412 925 870 08 × 2 = 0 + 0.796 016 906 472 936 825 851 740 16;
  • 107) 0.796 016 906 472 936 825 851 740 16 × 2 = 1 + 0.592 033 812 945 873 651 703 480 32;
  • 108) 0.592 033 812 945 873 651 703 480 32 × 2 = 1 + 0.184 067 625 891 747 303 406 960 64;
  • 109) 0.184 067 625 891 747 303 406 960 64 × 2 = 0 + 0.368 135 251 783 494 606 813 921 28;
  • 110) 0.368 135 251 783 494 606 813 921 28 × 2 = 0 + 0.736 270 503 566 989 213 627 842 56;
  • 111) 0.736 270 503 566 989 213 627 842 56 × 2 = 1 + 0.472 541 007 133 978 427 255 685 12;
  • 112) 0.472 541 007 133 978 427 255 685 12 × 2 = 0 + 0.945 082 014 267 956 854 511 370 24;
  • 113) 0.945 082 014 267 956 854 511 370 24 × 2 = 1 + 0.890 164 028 535 913 709 022 740 48;
  • 114) 0.890 164 028 535 913 709 022 740 48 × 2 = 1 + 0.780 328 057 071 827 418 045 480 96;
  • 115) 0.780 328 057 071 827 418 045 480 96 × 2 = 1 + 0.560 656 114 143 654 836 090 961 92;
  • 116) 0.560 656 114 143 654 836 090 961 92 × 2 = 1 + 0.121 312 228 287 309 672 181 923 84;
  • 117) 0.121 312 228 287 309 672 181 923 84 × 2 = 0 + 0.242 624 456 574 619 344 363 847 68;
  • 118) 0.242 624 456 574 619 344 363 847 68 × 2 = 0 + 0.485 248 913 149 238 688 727 695 36;
  • 119) 0.485 248 913 149 238 688 727 695 36 × 2 = 0 + 0.970 497 826 298 477 377 455 390 72;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 537 19(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1101 0100 0101 0100 1011 1100 0011 0010 1111 000(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 537 19(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1101 0100 0101 0100 1011 1100 0011 0010 1111 000(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 537 19(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1101 0100 0101 0100 1011 1100 0011 0010 1111 000(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0101 0000 1101 0100 0101 0100 1011 1100 0011 0010 1111 000(2) × 20 =


1.0100 0010 1000 0110 1010 0010 1010 0101 1110 0001 1001 0111 1000(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 1000 0110 1010 0010 1010 0101 1110 0001 1001 0111 1000


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 1000 0110 1010 0010 1010 0101 1110 0001 1001 0111 1000 =


0100 0010 1000 0110 1010 0010 1010 0101 1110 0001 1001 0111 1000


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 1000 0110 1010 0010 1010 0101 1110 0001 1001 0111 1000


Decimal number 0.000 000 000 000 000 000 008 537 19 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 1000 0110 1010 0010 1010 0101 1110 0001 1001 0111 1000


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100