0.000 000 000 000 000 000 008 533 81 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 533 81(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 533 81(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 533 81.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 533 81 × 2 = 0 + 0.000 000 000 000 000 000 017 067 62;
  • 2) 0.000 000 000 000 000 000 017 067 62 × 2 = 0 + 0.000 000 000 000 000 000 034 135 24;
  • 3) 0.000 000 000 000 000 000 034 135 24 × 2 = 0 + 0.000 000 000 000 000 000 068 270 48;
  • 4) 0.000 000 000 000 000 000 068 270 48 × 2 = 0 + 0.000 000 000 000 000 000 136 540 96;
  • 5) 0.000 000 000 000 000 000 136 540 96 × 2 = 0 + 0.000 000 000 000 000 000 273 081 92;
  • 6) 0.000 000 000 000 000 000 273 081 92 × 2 = 0 + 0.000 000 000 000 000 000 546 163 84;
  • 7) 0.000 000 000 000 000 000 546 163 84 × 2 = 0 + 0.000 000 000 000 000 001 092 327 68;
  • 8) 0.000 000 000 000 000 001 092 327 68 × 2 = 0 + 0.000 000 000 000 000 002 184 655 36;
  • 9) 0.000 000 000 000 000 002 184 655 36 × 2 = 0 + 0.000 000 000 000 000 004 369 310 72;
  • 10) 0.000 000 000 000 000 004 369 310 72 × 2 = 0 + 0.000 000 000 000 000 008 738 621 44;
  • 11) 0.000 000 000 000 000 008 738 621 44 × 2 = 0 + 0.000 000 000 000 000 017 477 242 88;
  • 12) 0.000 000 000 000 000 017 477 242 88 × 2 = 0 + 0.000 000 000 000 000 034 954 485 76;
  • 13) 0.000 000 000 000 000 034 954 485 76 × 2 = 0 + 0.000 000 000 000 000 069 908 971 52;
  • 14) 0.000 000 000 000 000 069 908 971 52 × 2 = 0 + 0.000 000 000 000 000 139 817 943 04;
  • 15) 0.000 000 000 000 000 139 817 943 04 × 2 = 0 + 0.000 000 000 000 000 279 635 886 08;
  • 16) 0.000 000 000 000 000 279 635 886 08 × 2 = 0 + 0.000 000 000 000 000 559 271 772 16;
  • 17) 0.000 000 000 000 000 559 271 772 16 × 2 = 0 + 0.000 000 000 000 001 118 543 544 32;
  • 18) 0.000 000 000 000 001 118 543 544 32 × 2 = 0 + 0.000 000 000 000 002 237 087 088 64;
  • 19) 0.000 000 000 000 002 237 087 088 64 × 2 = 0 + 0.000 000 000 000 004 474 174 177 28;
  • 20) 0.000 000 000 000 004 474 174 177 28 × 2 = 0 + 0.000 000 000 000 008 948 348 354 56;
  • 21) 0.000 000 000 000 008 948 348 354 56 × 2 = 0 + 0.000 000 000 000 017 896 696 709 12;
  • 22) 0.000 000 000 000 017 896 696 709 12 × 2 = 0 + 0.000 000 000 000 035 793 393 418 24;
  • 23) 0.000 000 000 000 035 793 393 418 24 × 2 = 0 + 0.000 000 000 000 071 586 786 836 48;
  • 24) 0.000 000 000 000 071 586 786 836 48 × 2 = 0 + 0.000 000 000 000 143 173 573 672 96;
  • 25) 0.000 000 000 000 143 173 573 672 96 × 2 = 0 + 0.000 000 000 000 286 347 147 345 92;
  • 26) 0.000 000 000 000 286 347 147 345 92 × 2 = 0 + 0.000 000 000 000 572 694 294 691 84;
  • 27) 0.000 000 000 000 572 694 294 691 84 × 2 = 0 + 0.000 000 000 001 145 388 589 383 68;
  • 28) 0.000 000 000 001 145 388 589 383 68 × 2 = 0 + 0.000 000 000 002 290 777 178 767 36;
  • 29) 0.000 000 000 002 290 777 178 767 36 × 2 = 0 + 0.000 000 000 004 581 554 357 534 72;
  • 30) 0.000 000 000 004 581 554 357 534 72 × 2 = 0 + 0.000 000 000 009 163 108 715 069 44;
  • 31) 0.000 000 000 009 163 108 715 069 44 × 2 = 0 + 0.000 000 000 018 326 217 430 138 88;
  • 32) 0.000 000 000 018 326 217 430 138 88 × 2 = 0 + 0.000 000 000 036 652 434 860 277 76;
  • 33) 0.000 000 000 036 652 434 860 277 76 × 2 = 0 + 0.000 000 000 073 304 869 720 555 52;
  • 34) 0.000 000 000 073 304 869 720 555 52 × 2 = 0 + 0.000 000 000 146 609 739 441 111 04;
  • 35) 0.000 000 000 146 609 739 441 111 04 × 2 = 0 + 0.000 000 000 293 219 478 882 222 08;
  • 36) 0.000 000 000 293 219 478 882 222 08 × 2 = 0 + 0.000 000 000 586 438 957 764 444 16;
  • 37) 0.000 000 000 586 438 957 764 444 16 × 2 = 0 + 0.000 000 001 172 877 915 528 888 32;
  • 38) 0.000 000 001 172 877 915 528 888 32 × 2 = 0 + 0.000 000 002 345 755 831 057 776 64;
  • 39) 0.000 000 002 345 755 831 057 776 64 × 2 = 0 + 0.000 000 004 691 511 662 115 553 28;
  • 40) 0.000 000 004 691 511 662 115 553 28 × 2 = 0 + 0.000 000 009 383 023 324 231 106 56;
  • 41) 0.000 000 009 383 023 324 231 106 56 × 2 = 0 + 0.000 000 018 766 046 648 462 213 12;
  • 42) 0.000 000 018 766 046 648 462 213 12 × 2 = 0 + 0.000 000 037 532 093 296 924 426 24;
  • 43) 0.000 000 037 532 093 296 924 426 24 × 2 = 0 + 0.000 000 075 064 186 593 848 852 48;
  • 44) 0.000 000 075 064 186 593 848 852 48 × 2 = 0 + 0.000 000 150 128 373 187 697 704 96;
  • 45) 0.000 000 150 128 373 187 697 704 96 × 2 = 0 + 0.000 000 300 256 746 375 395 409 92;
  • 46) 0.000 000 300 256 746 375 395 409 92 × 2 = 0 + 0.000 000 600 513 492 750 790 819 84;
  • 47) 0.000 000 600 513 492 750 790 819 84 × 2 = 0 + 0.000 001 201 026 985 501 581 639 68;
  • 48) 0.000 001 201 026 985 501 581 639 68 × 2 = 0 + 0.000 002 402 053 971 003 163 279 36;
  • 49) 0.000 002 402 053 971 003 163 279 36 × 2 = 0 + 0.000 004 804 107 942 006 326 558 72;
  • 50) 0.000 004 804 107 942 006 326 558 72 × 2 = 0 + 0.000 009 608 215 884 012 653 117 44;
  • 51) 0.000 009 608 215 884 012 653 117 44 × 2 = 0 + 0.000 019 216 431 768 025 306 234 88;
  • 52) 0.000 019 216 431 768 025 306 234 88 × 2 = 0 + 0.000 038 432 863 536 050 612 469 76;
  • 53) 0.000 038 432 863 536 050 612 469 76 × 2 = 0 + 0.000 076 865 727 072 101 224 939 52;
  • 54) 0.000 076 865 727 072 101 224 939 52 × 2 = 0 + 0.000 153 731 454 144 202 449 879 04;
  • 55) 0.000 153 731 454 144 202 449 879 04 × 2 = 0 + 0.000 307 462 908 288 404 899 758 08;
  • 56) 0.000 307 462 908 288 404 899 758 08 × 2 = 0 + 0.000 614 925 816 576 809 799 516 16;
  • 57) 0.000 614 925 816 576 809 799 516 16 × 2 = 0 + 0.001 229 851 633 153 619 599 032 32;
  • 58) 0.001 229 851 633 153 619 599 032 32 × 2 = 0 + 0.002 459 703 266 307 239 198 064 64;
  • 59) 0.002 459 703 266 307 239 198 064 64 × 2 = 0 + 0.004 919 406 532 614 478 396 129 28;
  • 60) 0.004 919 406 532 614 478 396 129 28 × 2 = 0 + 0.009 838 813 065 228 956 792 258 56;
  • 61) 0.009 838 813 065 228 956 792 258 56 × 2 = 0 + 0.019 677 626 130 457 913 584 517 12;
  • 62) 0.019 677 626 130 457 913 584 517 12 × 2 = 0 + 0.039 355 252 260 915 827 169 034 24;
  • 63) 0.039 355 252 260 915 827 169 034 24 × 2 = 0 + 0.078 710 504 521 831 654 338 068 48;
  • 64) 0.078 710 504 521 831 654 338 068 48 × 2 = 0 + 0.157 421 009 043 663 308 676 136 96;
  • 65) 0.157 421 009 043 663 308 676 136 96 × 2 = 0 + 0.314 842 018 087 326 617 352 273 92;
  • 66) 0.314 842 018 087 326 617 352 273 92 × 2 = 0 + 0.629 684 036 174 653 234 704 547 84;
  • 67) 0.629 684 036 174 653 234 704 547 84 × 2 = 1 + 0.259 368 072 349 306 469 409 095 68;
  • 68) 0.259 368 072 349 306 469 409 095 68 × 2 = 0 + 0.518 736 144 698 612 938 818 191 36;
  • 69) 0.518 736 144 698 612 938 818 191 36 × 2 = 1 + 0.037 472 289 397 225 877 636 382 72;
  • 70) 0.037 472 289 397 225 877 636 382 72 × 2 = 0 + 0.074 944 578 794 451 755 272 765 44;
  • 71) 0.074 944 578 794 451 755 272 765 44 × 2 = 0 + 0.149 889 157 588 903 510 545 530 88;
  • 72) 0.149 889 157 588 903 510 545 530 88 × 2 = 0 + 0.299 778 315 177 807 021 091 061 76;
  • 73) 0.299 778 315 177 807 021 091 061 76 × 2 = 0 + 0.599 556 630 355 614 042 182 123 52;
  • 74) 0.599 556 630 355 614 042 182 123 52 × 2 = 1 + 0.199 113 260 711 228 084 364 247 04;
  • 75) 0.199 113 260 711 228 084 364 247 04 × 2 = 0 + 0.398 226 521 422 456 168 728 494 08;
  • 76) 0.398 226 521 422 456 168 728 494 08 × 2 = 0 + 0.796 453 042 844 912 337 456 988 16;
  • 77) 0.796 453 042 844 912 337 456 988 16 × 2 = 1 + 0.592 906 085 689 824 674 913 976 32;
  • 78) 0.592 906 085 689 824 674 913 976 32 × 2 = 1 + 0.185 812 171 379 649 349 827 952 64;
  • 79) 0.185 812 171 379 649 349 827 952 64 × 2 = 0 + 0.371 624 342 759 298 699 655 905 28;
  • 80) 0.371 624 342 759 298 699 655 905 28 × 2 = 0 + 0.743 248 685 518 597 399 311 810 56;
  • 81) 0.743 248 685 518 597 399 311 810 56 × 2 = 1 + 0.486 497 371 037 194 798 623 621 12;
  • 82) 0.486 497 371 037 194 798 623 621 12 × 2 = 0 + 0.972 994 742 074 389 597 247 242 24;
  • 83) 0.972 994 742 074 389 597 247 242 24 × 2 = 1 + 0.945 989 484 148 779 194 494 484 48;
  • 84) 0.945 989 484 148 779 194 494 484 48 × 2 = 1 + 0.891 978 968 297 558 388 988 968 96;
  • 85) 0.891 978 968 297 558 388 988 968 96 × 2 = 1 + 0.783 957 936 595 116 777 977 937 92;
  • 86) 0.783 957 936 595 116 777 977 937 92 × 2 = 1 + 0.567 915 873 190 233 555 955 875 84;
  • 87) 0.567 915 873 190 233 555 955 875 84 × 2 = 1 + 0.135 831 746 380 467 111 911 751 68;
  • 88) 0.135 831 746 380 467 111 911 751 68 × 2 = 0 + 0.271 663 492 760 934 223 823 503 36;
  • 89) 0.271 663 492 760 934 223 823 503 36 × 2 = 0 + 0.543 326 985 521 868 447 647 006 72;
  • 90) 0.543 326 985 521 868 447 647 006 72 × 2 = 1 + 0.086 653 971 043 736 895 294 013 44;
  • 91) 0.086 653 971 043 736 895 294 013 44 × 2 = 0 + 0.173 307 942 087 473 790 588 026 88;
  • 92) 0.173 307 942 087 473 790 588 026 88 × 2 = 0 + 0.346 615 884 174 947 581 176 053 76;
  • 93) 0.346 615 884 174 947 581 176 053 76 × 2 = 0 + 0.693 231 768 349 895 162 352 107 52;
  • 94) 0.693 231 768 349 895 162 352 107 52 × 2 = 1 + 0.386 463 536 699 790 324 704 215 04;
  • 95) 0.386 463 536 699 790 324 704 215 04 × 2 = 0 + 0.772 927 073 399 580 649 408 430 08;
  • 96) 0.772 927 073 399 580 649 408 430 08 × 2 = 1 + 0.545 854 146 799 161 298 816 860 16;
  • 97) 0.545 854 146 799 161 298 816 860 16 × 2 = 1 + 0.091 708 293 598 322 597 633 720 32;
  • 98) 0.091 708 293 598 322 597 633 720 32 × 2 = 0 + 0.183 416 587 196 645 195 267 440 64;
  • 99) 0.183 416 587 196 645 195 267 440 64 × 2 = 0 + 0.366 833 174 393 290 390 534 881 28;
  • 100) 0.366 833 174 393 290 390 534 881 28 × 2 = 0 + 0.733 666 348 786 580 781 069 762 56;
  • 101) 0.733 666 348 786 580 781 069 762 56 × 2 = 1 + 0.467 332 697 573 161 562 139 525 12;
  • 102) 0.467 332 697 573 161 562 139 525 12 × 2 = 0 + 0.934 665 395 146 323 124 279 050 24;
  • 103) 0.934 665 395 146 323 124 279 050 24 × 2 = 1 + 0.869 330 790 292 646 248 558 100 48;
  • 104) 0.869 330 790 292 646 248 558 100 48 × 2 = 1 + 0.738 661 580 585 292 497 116 200 96;
  • 105) 0.738 661 580 585 292 497 116 200 96 × 2 = 1 + 0.477 323 161 170 584 994 232 401 92;
  • 106) 0.477 323 161 170 584 994 232 401 92 × 2 = 0 + 0.954 646 322 341 169 988 464 803 84;
  • 107) 0.954 646 322 341 169 988 464 803 84 × 2 = 1 + 0.909 292 644 682 339 976 929 607 68;
  • 108) 0.909 292 644 682 339 976 929 607 68 × 2 = 1 + 0.818 585 289 364 679 953 859 215 36;
  • 109) 0.818 585 289 364 679 953 859 215 36 × 2 = 1 + 0.637 170 578 729 359 907 718 430 72;
  • 110) 0.637 170 578 729 359 907 718 430 72 × 2 = 1 + 0.274 341 157 458 719 815 436 861 44;
  • 111) 0.274 341 157 458 719 815 436 861 44 × 2 = 0 + 0.548 682 314 917 439 630 873 722 88;
  • 112) 0.548 682 314 917 439 630 873 722 88 × 2 = 1 + 0.097 364 629 834 879 261 747 445 76;
  • 113) 0.097 364 629 834 879 261 747 445 76 × 2 = 0 + 0.194 729 259 669 758 523 494 891 52;
  • 114) 0.194 729 259 669 758 523 494 891 52 × 2 = 0 + 0.389 458 519 339 517 046 989 783 04;
  • 115) 0.389 458 519 339 517 046 989 783 04 × 2 = 0 + 0.778 917 038 679 034 093 979 566 08;
  • 116) 0.778 917 038 679 034 093 979 566 08 × 2 = 1 + 0.557 834 077 358 068 187 959 132 16;
  • 117) 0.557 834 077 358 068 187 959 132 16 × 2 = 1 + 0.115 668 154 716 136 375 918 264 32;
  • 118) 0.115 668 154 716 136 375 918 264 32 × 2 = 0 + 0.231 336 309 432 272 751 836 528 64;
  • 119) 0.231 336 309 432 272 751 836 528 64 × 2 = 0 + 0.462 672 618 864 545 503 673 057 28;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 533 81(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1100 1011 1110 0100 0101 1000 1011 1011 1101 0001 100(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 533 81(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1100 1011 1110 0100 0101 1000 1011 1011 1101 0001 100(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 533 81(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1100 1011 1110 0100 0101 1000 1011 1011 1101 0001 100(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1100 1011 1110 0100 0101 1000 1011 1011 1101 0001 100(2) × 20 =


1.0100 0010 0110 0101 1111 0010 0010 1100 0101 1101 1110 1000 1100(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 0110 0101 1111 0010 0010 1100 0101 1101 1110 1000 1100


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 0110 0101 1111 0010 0010 1100 0101 1101 1110 1000 1100 =


0100 0010 0110 0101 1111 0010 0010 1100 0101 1101 1110 1000 1100


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 0110 0101 1111 0010 0010 1100 0101 1101 1110 1000 1100


Decimal number 0.000 000 000 000 000 000 008 533 81 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 0110 0101 1111 0010 0010 1100 0101 1101 1110 1000 1100


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100