Base ten decimal number 0.000 000 000 000 000 222 044 6 converted to 64 bit double precision IEEE 754 binary floating point standard

How to convert the decimal number 0.000 000 000 000 000 222 044 6(10)
to
64 bit double precision IEEE 754 binary floating point
(1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (base 2) the integer part: 0. Divide the number repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:

  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number, by taking all the remainders starting from the bottom of the list constructed above:

0(10) =


0(2)

3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 222 044 6. Multiply it repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:

  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 222 044 6 × 2 = 0 + 0.000 000 000 000 000 444 089 2;
  • 2) 0.000 000 000 000 000 444 089 2 × 2 = 0 + 0.000 000 000 000 000 888 178 4;
  • 3) 0.000 000 000 000 000 888 178 4 × 2 = 0 + 0.000 000 000 000 001 776 356 8;
  • 4) 0.000 000 000 000 001 776 356 8 × 2 = 0 + 0.000 000 000 000 003 552 713 6;
  • 5) 0.000 000 000 000 003 552 713 6 × 2 = 0 + 0.000 000 000 000 007 105 427 2;
  • 6) 0.000 000 000 000 007 105 427 2 × 2 = 0 + 0.000 000 000 000 014 210 854 4;
  • 7) 0.000 000 000 000 014 210 854 4 × 2 = 0 + 0.000 000 000 000 028 421 708 8;
  • 8) 0.000 000 000 000 028 421 708 8 × 2 = 0 + 0.000 000 000 000 056 843 417 6;
  • 9) 0.000 000 000 000 056 843 417 6 × 2 = 0 + 0.000 000 000 000 113 686 835 2;
  • 10) 0.000 000 000 000 113 686 835 2 × 2 = 0 + 0.000 000 000 000 227 373 670 4;
  • 11) 0.000 000 000 000 227 373 670 4 × 2 = 0 + 0.000 000 000 000 454 747 340 8;
  • 12) 0.000 000 000 000 454 747 340 8 × 2 = 0 + 0.000 000 000 000 909 494 681 6;
  • 13) 0.000 000 000 000 909 494 681 6 × 2 = 0 + 0.000 000 000 001 818 989 363 2;
  • 14) 0.000 000 000 001 818 989 363 2 × 2 = 0 + 0.000 000 000 003 637 978 726 4;
  • 15) 0.000 000 000 003 637 978 726 4 × 2 = 0 + 0.000 000 000 007 275 957 452 8;
  • 16) 0.000 000 000 007 275 957 452 8 × 2 = 0 + 0.000 000 000 014 551 914 905 6;
  • 17) 0.000 000 000 014 551 914 905 6 × 2 = 0 + 0.000 000 000 029 103 829 811 2;
  • 18) 0.000 000 000 029 103 829 811 2 × 2 = 0 + 0.000 000 000 058 207 659 622 4;
  • 19) 0.000 000 000 058 207 659 622 4 × 2 = 0 + 0.000 000 000 116 415 319 244 8;
  • 20) 0.000 000 000 116 415 319 244 8 × 2 = 0 + 0.000 000 000 232 830 638 489 6;
  • 21) 0.000 000 000 232 830 638 489 6 × 2 = 0 + 0.000 000 000 465 661 276 979 2;
  • 22) 0.000 000 000 465 661 276 979 2 × 2 = 0 + 0.000 000 000 931 322 553 958 4;
  • 23) 0.000 000 000 931 322 553 958 4 × 2 = 0 + 0.000 000 001 862 645 107 916 8;
  • 24) 0.000 000 001 862 645 107 916 8 × 2 = 0 + 0.000 000 003 725 290 215 833 6;
  • 25) 0.000 000 003 725 290 215 833 6 × 2 = 0 + 0.000 000 007 450 580 431 667 2;
  • 26) 0.000 000 007 450 580 431 667 2 × 2 = 0 + 0.000 000 014 901 160 863 334 4;
  • 27) 0.000 000 014 901 160 863 334 4 × 2 = 0 + 0.000 000 029 802 321 726 668 8;
  • 28) 0.000 000 029 802 321 726 668 8 × 2 = 0 + 0.000 000 059 604 643 453 337 6;
  • 29) 0.000 000 059 604 643 453 337 6 × 2 = 0 + 0.000 000 119 209 286 906 675 2;
  • 30) 0.000 000 119 209 286 906 675 2 × 2 = 0 + 0.000 000 238 418 573 813 350 4;
  • 31) 0.000 000 238 418 573 813 350 4 × 2 = 0 + 0.000 000 476 837 147 626 700 8;
  • 32) 0.000 000 476 837 147 626 700 8 × 2 = 0 + 0.000 000 953 674 295 253 401 6;
  • 33) 0.000 000 953 674 295 253 401 6 × 2 = 0 + 0.000 001 907 348 590 506 803 2;
  • 34) 0.000 001 907 348 590 506 803 2 × 2 = 0 + 0.000 003 814 697 181 013 606 4;
  • 35) 0.000 003 814 697 181 013 606 4 × 2 = 0 + 0.000 007 629 394 362 027 212 8;
  • 36) 0.000 007 629 394 362 027 212 8 × 2 = 0 + 0.000 015 258 788 724 054 425 6;
  • 37) 0.000 015 258 788 724 054 425 6 × 2 = 0 + 0.000 030 517 577 448 108 851 2;
  • 38) 0.000 030 517 577 448 108 851 2 × 2 = 0 + 0.000 061 035 154 896 217 702 4;
  • 39) 0.000 061 035 154 896 217 702 4 × 2 = 0 + 0.000 122 070 309 792 435 404 8;
  • 40) 0.000 122 070 309 792 435 404 8 × 2 = 0 + 0.000 244 140 619 584 870 809 6;
  • 41) 0.000 244 140 619 584 870 809 6 × 2 = 0 + 0.000 488 281 239 169 741 619 2;
  • 42) 0.000 488 281 239 169 741 619 2 × 2 = 0 + 0.000 976 562 478 339 483 238 4;
  • 43) 0.000 976 562 478 339 483 238 4 × 2 = 0 + 0.001 953 124 956 678 966 476 8;
  • 44) 0.001 953 124 956 678 966 476 8 × 2 = 0 + 0.003 906 249 913 357 932 953 6;
  • 45) 0.003 906 249 913 357 932 953 6 × 2 = 0 + 0.007 812 499 826 715 865 907 2;
  • 46) 0.007 812 499 826 715 865 907 2 × 2 = 0 + 0.015 624 999 653 431 731 814 4;
  • 47) 0.015 624 999 653 431 731 814 4 × 2 = 0 + 0.031 249 999 306 863 463 628 8;
  • 48) 0.031 249 999 306 863 463 628 8 × 2 = 0 + 0.062 499 998 613 726 927 257 6;
  • 49) 0.062 499 998 613 726 927 257 6 × 2 = 0 + 0.124 999 997 227 453 854 515 2;
  • 50) 0.124 999 997 227 453 854 515 2 × 2 = 0 + 0.249 999 994 454 907 709 030 4;
  • 51) 0.249 999 994 454 907 709 030 4 × 2 = 0 + 0.499 999 988 909 815 418 060 8;
  • 52) 0.499 999 988 909 815 418 060 8 × 2 = 0 + 0.999 999 977 819 630 836 121 6;
  • 53) 0.999 999 977 819 630 836 121 6 × 2 = 1 + 0.999 999 955 639 261 672 243 2;
  • 54) 0.999 999 955 639 261 672 243 2 × 2 = 1 + 0.999 999 911 278 523 344 486 4;
  • 55) 0.999 999 911 278 523 344 486 4 × 2 = 1 + 0.999 999 822 557 046 688 972 8;
  • 56) 0.999 999 822 557 046 688 972 8 × 2 = 1 + 0.999 999 645 114 093 377 945 6;
  • 57) 0.999 999 645 114 093 377 945 6 × 2 = 1 + 0.999 999 290 228 186 755 891 2;
  • 58) 0.999 999 290 228 186 755 891 2 × 2 = 1 + 0.999 998 580 456 373 511 782 4;
  • 59) 0.999 998 580 456 373 511 782 4 × 2 = 1 + 0.999 997 160 912 747 023 564 8;
  • 60) 0.999 997 160 912 747 023 564 8 × 2 = 1 + 0.999 994 321 825 494 047 129 6;
  • 61) 0.999 994 321 825 494 047 129 6 × 2 = 1 + 0.999 988 643 650 988 094 259 2;
  • 62) 0.999 988 643 650 988 094 259 2 × 2 = 1 + 0.999 977 287 301 976 188 518 4;
  • 63) 0.999 977 287 301 976 188 518 4 × 2 = 1 + 0.999 954 574 603 952 377 036 8;
  • 64) 0.999 954 574 603 952 377 036 8 × 2 = 1 + 0.999 909 149 207 904 754 073 6;
  • 65) 0.999 909 149 207 904 754 073 6 × 2 = 1 + 0.999 818 298 415 809 508 147 2;
  • 66) 0.999 818 298 415 809 508 147 2 × 2 = 1 + 0.999 636 596 831 619 016 294 4;
  • 67) 0.999 636 596 831 619 016 294 4 × 2 = 1 + 0.999 273 193 663 238 032 588 8;
  • 68) 0.999 273 193 663 238 032 588 8 × 2 = 1 + 0.998 546 387 326 476 065 177 6;
  • 69) 0.998 546 387 326 476 065 177 6 × 2 = 1 + 0.997 092 774 652 952 130 355 2;
  • 70) 0.997 092 774 652 952 130 355 2 × 2 = 1 + 0.994 185 549 305 904 260 710 4;
  • 71) 0.994 185 549 305 904 260 710 4 × 2 = 1 + 0.988 371 098 611 808 521 420 8;
  • 72) 0.988 371 098 611 808 521 420 8 × 2 = 1 + 0.976 742 197 223 617 042 841 6;
  • 73) 0.976 742 197 223 617 042 841 6 × 2 = 1 + 0.953 484 394 447 234 085 683 2;
  • 74) 0.953 484 394 447 234 085 683 2 × 2 = 1 + 0.906 968 788 894 468 171 366 4;
  • 75) 0.906 968 788 894 468 171 366 4 × 2 = 1 + 0.813 937 577 788 936 342 732 8;
  • 76) 0.813 937 577 788 936 342 732 8 × 2 = 1 + 0.627 875 155 577 872 685 465 6;
  • 77) 0.627 875 155 577 872 685 465 6 × 2 = 1 + 0.255 750 311 155 745 370 931 2;
  • 78) 0.255 750 311 155 745 370 931 2 × 2 = 0 + 0.511 500 622 311 490 741 862 4;
  • 79) 0.511 500 622 311 490 741 862 4 × 2 = 1 + 0.023 001 244 622 981 483 724 8;
  • 80) 0.023 001 244 622 981 483 724 8 × 2 = 0 + 0.046 002 489 245 962 967 449 6;
  • 81) 0.046 002 489 245 962 967 449 6 × 2 = 0 + 0.092 004 978 491 925 934 899 2;
  • 82) 0.092 004 978 491 925 934 899 2 × 2 = 0 + 0.184 009 956 983 851 869 798 4;
  • 83) 0.184 009 956 983 851 869 798 4 × 2 = 0 + 0.368 019 913 967 703 739 596 8;
  • 84) 0.368 019 913 967 703 739 596 8 × 2 = 0 + 0.736 039 827 935 407 479 193 6;
  • 85) 0.736 039 827 935 407 479 193 6 × 2 = 1 + 0.472 079 655 870 814 958 387 2;
  • 86) 0.472 079 655 870 814 958 387 2 × 2 = 0 + 0.944 159 311 741 629 916 774 4;
  • 87) 0.944 159 311 741 629 916 774 4 × 2 = 1 + 0.888 318 623 483 259 833 548 8;
  • 88) 0.888 318 623 483 259 833 548 8 × 2 = 1 + 0.776 637 246 966 519 667 097 6;
  • 89) 0.776 637 246 966 519 667 097 6 × 2 = 1 + 0.553 274 493 933 039 334 195 2;
  • 90) 0.553 274 493 933 039 334 195 2 × 2 = 1 + 0.106 548 987 866 078 668 390 4;
  • 91) 0.106 548 987 866 078 668 390 4 × 2 = 0 + 0.213 097 975 732 157 336 780 8;
  • 92) 0.213 097 975 732 157 336 780 8 × 2 = 0 + 0.426 195 951 464 314 673 561 6;
  • 93) 0.426 195 951 464 314 673 561 6 × 2 = 0 + 0.852 391 902 928 629 347 123 2;
  • 94) 0.852 391 902 928 629 347 123 2 × 2 = 1 + 0.704 783 805 857 258 694 246 4;
  • 95) 0.704 783 805 857 258 694 246 4 × 2 = 1 + 0.409 567 611 714 517 388 492 8;
  • 96) 0.409 567 611 714 517 388 492 8 × 2 = 0 + 0.819 135 223 429 034 776 985 6;
  • 97) 0.819 135 223 429 034 776 985 6 × 2 = 1 + 0.638 270 446 858 069 553 971 2;
  • 98) 0.638 270 446 858 069 553 971 2 × 2 = 1 + 0.276 540 893 716 139 107 942 4;
  • 99) 0.276 540 893 716 139 107 942 4 × 2 = 0 + 0.553 081 787 432 278 215 884 8;
  • 100) 0.553 081 787 432 278 215 884 8 × 2 = 1 + 0.106 163 574 864 556 431 769 6;
  • 101) 0.106 163 574 864 556 431 769 6 × 2 = 0 + 0.212 327 149 729 112 863 539 2;
  • 102) 0.212 327 149 729 112 863 539 2 × 2 = 0 + 0.424 654 299 458 225 727 078 4;
  • 103) 0.424 654 299 458 225 727 078 4 × 2 = 0 + 0.849 308 598 916 451 454 156 8;
  • 104) 0.849 308 598 916 451 454 156 8 × 2 = 1 + 0.698 617 197 832 902 908 313 6;
  • 105) 0.698 617 197 832 902 908 313 6 × 2 = 1 + 0.397 234 395 665 805 816 627 2;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (losing precision...)

4. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the constructed list above:

0.000 000 000 000 000 222 044 6(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 1111 1111 1010 0000 1011 1100 0110 1101 0001 1(2)

Positive number before normalization:

0.000 000 000 000 000 222 044 6(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 1111 1111 1010 0000 1011 1100 0110 1101 0001 1(2)

5. Normalize the binary representation of the number, shifting the decimal mark 53 positions to the right so that only one non zero digit remains to the left of it:

0.000 000 000 000 000 222 044 6(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 1111 1111 1010 0000 1011 1100 0110 1101 0001 1(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 1111 1111 1010 0000 1011 1100 0110 1101 0001 1(2) × 20 =


1.1111 1111 1111 1111 1111 1111 0100 0001 0111 1000 1101 1010 0011(2) × 2-53

Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign: 0 (a positive number)


Exponent (unadjusted): -53


Mantissa (not normalized): 1.1111 1111 1111 1111 1111 1111 0100 0001 0111 1000 1101 1010 0011

6. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2:

Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-53 + 2(11-1) - 1 =


(-53 + 1 023)(10) =


970(10)


  • division = quotient + remainder;
  • 970 ÷ 2 = 485 + 0;
  • 485 ÷ 2 = 242 + 1;
  • 242 ÷ 2 = 121 + 0;
  • 121 ÷ 2 = 60 + 1;
  • 60 ÷ 2 = 30 + 0;
  • 30 ÷ 2 = 15 + 0;
  • 15 ÷ 2 = 7 + 1;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

Exponent (adjusted) =


970(10) =


011 1100 1010(2)

7. Normalize mantissa, remove the leading (the leftmost) bit, since it's allways 1 (and the decimal point, if the case) then adjust its length to 52 bits, only if necessary (not the case here):

Mantissa (normalized) =


1. 1111 1111 1111 1111 1111 1111 0100 0001 0111 1000 1101 1010 0011 =


1111 1111 1111 1111 1111 1111 0100 0001 0111 1000 1101 1010 0011

Conclusion:

The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1100 1010


Mantissa (52 bits) =
1111 1111 1111 1111 1111 1111 0100 0001 0111 1000 1101 1010 0011

Number 0.000 000 000 000 000 222 044 6, a decimal, converted from decimal system (base 10)
to
64 bit double precision IEEE 754 binary floating point:


0 - 011 1100 1010 - 1111 1111 1111 1111 1111 1111 0100 0001 0111 1000 1101 1010 0011

(64 bits IEEE 754)
  • Sign (1 bit):

    • 0

      63
  • Exponent (11 bits):

    • 0

      62
    • 1

      61
    • 1

      60
    • 1

      59
    • 1

      58
    • 0

      57
    • 0

      56
    • 1

      55
    • 0

      54
    • 1

      53
    • 0

      52
  • Mantissa (52 bits):

    • 1

      51
    • 1

      50
    • 1

      49
    • 1

      48
    • 1

      47
    • 1

      46
    • 1

      45
    • 1

      44
    • 1

      43
    • 1

      42
    • 1

      41
    • 1

      40
    • 1

      39
    • 1

      38
    • 1

      37
    • 1

      36
    • 1

      35
    • 1

      34
    • 1

      33
    • 1

      32
    • 1

      31
    • 1

      30
    • 1

      29
    • 1

      28
    • 0

      27
    • 1

      26
    • 0

      25
    • 0

      24
    • 0

      23
    • 0

      22
    • 0

      21
    • 1

      20
    • 0

      19
    • 1

      18
    • 1

      17
    • 1

      16
    • 1

      15
    • 0

      14
    • 0

      13
    • 0

      12
    • 1

      11
    • 1

      10
    • 0

      9
    • 1

      8
    • 1

      7
    • 0

      6
    • 1

      5
    • 0

      4
    • 0

      3
    • 0

      2
    • 1

      1
    • 1

      0

Convert decimal numbers from base ten to 64 bit double precision IEEE 754 binary floating point standard

A number in 64 bit double precision IEEE 754 binary floating point standard representation requires three building elements: sign (it takes 1 bit and it's either 0 for positive or 1 for negative numbers), exponent (11 bits), mantissa (52 bits)

Latest decimal numbers converted from base ten to 64 bit double precision IEEE 754 floating point binary standard representation

0.000 000 000 000 000 222 044 6 = 0 - 011 1100 1010 - 1111 1111 1111 1111 1111 1111 0100 0001 0111 1000 1101 1010 0011 Apr 20 18:16 UTC (GMT)
3.130 000 000 000 000 3 = 0 - 100 0000 0000 - 1001 0000 1010 0011 1101 0111 0000 1010 0011 1101 0111 0000 1010 Apr 20 18:15 UTC (GMT)
43.56 = 0 - 100 0000 0100 - 0101 1100 0111 1010 1110 0001 0100 0111 1010 1110 0001 0100 0111 Apr 20 18:15 UTC (GMT)
40 = 0 - 100 0000 0100 - 0100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 Apr 20 18:14 UTC (GMT)
123.123 = 0 - 100 0000 0101 - 1110 1100 0111 1101 1111 0011 1011 0110 0100 0101 1010 0001 1100 Apr 20 18:14 UTC (GMT)
-1 602.080 35 = 1 - 100 0000 1001 - 1001 0000 1000 0101 0010 0100 0111 0100 0101 0011 1000 1110 1111 Apr 20 18:14 UTC (GMT)
0.300 000 000 000 000 044 408 920 985 006 261 616 945 266 723 632 812 5 = 0 - 011 1111 1101 - 0011 0011 0011 0011 0011 0011 0011 0011 0011 0011 0011 0011 0100 Apr 20 18:14 UTC (GMT)
160 207 041 = 0 - 100 0001 1010 - 0011 0001 1001 0010 0001 1000 0010 0000 0000 0000 0000 0000 0000 Apr 20 18:14 UTC (GMT)
49.505 = 0 - 100 0000 0100 - 1000 1100 0000 1010 0011 1101 0111 0000 1010 0011 1101 0111 0000 Apr 20 18:13 UTC (GMT)
0.000 000 000 000 000 160 2 = 0 - 011 1100 1010 - 0111 0001 0110 0101 0110 0011 1000 1001 1010 1100 0010 0100 1011 Apr 20 18:11 UTC (GMT)
7 938.1 = 0 - 100 0000 1011 - 1111 0000 0010 0001 1001 1001 1001 1001 1001 1001 1001 1001 1001 Apr 20 18:05 UTC (GMT)
43 627 849 036 223 746 239 757 362 364 923 236 237 = 0 - 100 0111 1100 - 0000 0110 1001 0011 0101 1000 0011 1101 1110 0101 0101 0100 1001 Apr 20 18:02 UTC (GMT)
14.72 = 0 - 100 0000 0010 - 1101 0111 0000 1010 0011 1101 0111 0000 1010 0011 1101 0111 0000 Apr 20 18:02 UTC (GMT)
All base ten decimal numbers converted to 64 bit double precision IEEE 754 binary floating point

How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =


    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100