Convert 0.029 999 999 999 999 998 889 776 975 374 843 459 576 368 331 909 179 687 4 to 64 Bit Double Precision IEEE 754 Binary Floating Point Standard, From a Number in Base 10 Decimal System

How to convert the decimal number 0.029 999 999 999 999 998 889 776 975 374 843 459 576 368 331 909 179 687 4(10)
to
64 bit double precision IEEE 754 binary floating point
(1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (base 2) the integer part: 0. Divide the number repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:

  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number, by taking all the remainders starting from the bottom of the list constructed above:

0(10) =


0(2)

3. Convert to binary (base 2) the fractional part: 0.029 999 999 999 999 998 889 776 975 374 843 459 576 368 331 909 179 687 4. Multiply it repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:

  • #) multiplying = integer + fractional part;
  • 1) 0.029 999 999 999 999 998 889 776 975 374 843 459 576 368 331 909 179 687 4 × 2 = 0 + 0.059 999 999 999 999 997 779 553 950 749 686 919 152 736 663 818 359 374 8;
  • 2) 0.059 999 999 999 999 997 779 553 950 749 686 919 152 736 663 818 359 374 8 × 2 = 0 + 0.119 999 999 999 999 995 559 107 901 499 373 838 305 473 327 636 718 749 6;
  • 3) 0.119 999 999 999 999 995 559 107 901 499 373 838 305 473 327 636 718 749 6 × 2 = 0 + 0.239 999 999 999 999 991 118 215 802 998 747 676 610 946 655 273 437 499 2;
  • 4) 0.239 999 999 999 999 991 118 215 802 998 747 676 610 946 655 273 437 499 2 × 2 = 0 + 0.479 999 999 999 999 982 236 431 605 997 495 353 221 893 310 546 874 998 4;
  • 5) 0.479 999 999 999 999 982 236 431 605 997 495 353 221 893 310 546 874 998 4 × 2 = 0 + 0.959 999 999 999 999 964 472 863 211 994 990 706 443 786 621 093 749 996 8;
  • 6) 0.959 999 999 999 999 964 472 863 211 994 990 706 443 786 621 093 749 996 8 × 2 = 1 + 0.919 999 999 999 999 928 945 726 423 989 981 412 887 573 242 187 499 993 6;
  • 7) 0.919 999 999 999 999 928 945 726 423 989 981 412 887 573 242 187 499 993 6 × 2 = 1 + 0.839 999 999 999 999 857 891 452 847 979 962 825 775 146 484 374 999 987 2;
  • 8) 0.839 999 999 999 999 857 891 452 847 979 962 825 775 146 484 374 999 987 2 × 2 = 1 + 0.679 999 999 999 999 715 782 905 695 959 925 651 550 292 968 749 999 974 4;
  • 9) 0.679 999 999 999 999 715 782 905 695 959 925 651 550 292 968 749 999 974 4 × 2 = 1 + 0.359 999 999 999 999 431 565 811 391 919 851 303 100 585 937 499 999 948 8;
  • 10) 0.359 999 999 999 999 431 565 811 391 919 851 303 100 585 937 499 999 948 8 × 2 = 0 + 0.719 999 999 999 998 863 131 622 783 839 702 606 201 171 874 999 999 897 6;
  • 11) 0.719 999 999 999 998 863 131 622 783 839 702 606 201 171 874 999 999 897 6 × 2 = 1 + 0.439 999 999 999 997 726 263 245 567 679 405 212 402 343 749 999 999 795 2;
  • 12) 0.439 999 999 999 997 726 263 245 567 679 405 212 402 343 749 999 999 795 2 × 2 = 0 + 0.879 999 999 999 995 452 526 491 135 358 810 424 804 687 499 999 999 590 4;
  • 13) 0.879 999 999 999 995 452 526 491 135 358 810 424 804 687 499 999 999 590 4 × 2 = 1 + 0.759 999 999 999 990 905 052 982 270 717 620 849 609 374 999 999 999 180 8;
  • 14) 0.759 999 999 999 990 905 052 982 270 717 620 849 609 374 999 999 999 180 8 × 2 = 1 + 0.519 999 999 999 981 810 105 964 541 435 241 699 218 749 999 999 998 361 6;
  • 15) 0.519 999 999 999 981 810 105 964 541 435 241 699 218 749 999 999 998 361 6 × 2 = 1 + 0.039 999 999 999 963 620 211 929 082 870 483 398 437 499 999 999 996 723 2;
  • 16) 0.039 999 999 999 963 620 211 929 082 870 483 398 437 499 999 999 996 723 2 × 2 = 0 + 0.079 999 999 999 927 240 423 858 165 740 966 796 874 999 999 999 993 446 4;
  • 17) 0.079 999 999 999 927 240 423 858 165 740 966 796 874 999 999 999 993 446 4 × 2 = 0 + 0.159 999 999 999 854 480 847 716 331 481 933 593 749 999 999 999 986 892 8;
  • 18) 0.159 999 999 999 854 480 847 716 331 481 933 593 749 999 999 999 986 892 8 × 2 = 0 + 0.319 999 999 999 708 961 695 432 662 963 867 187 499 999 999 999 973 785 6;
  • 19) 0.319 999 999 999 708 961 695 432 662 963 867 187 499 999 999 999 973 785 6 × 2 = 0 + 0.639 999 999 999 417 923 390 865 325 927 734 374 999 999 999 999 947 571 2;
  • 20) 0.639 999 999 999 417 923 390 865 325 927 734 374 999 999 999 999 947 571 2 × 2 = 1 + 0.279 999 999 998 835 846 781 730 651 855 468 749 999 999 999 999 895 142 4;
  • 21) 0.279 999 999 998 835 846 781 730 651 855 468 749 999 999 999 999 895 142 4 × 2 = 0 + 0.559 999 999 997 671 693 563 461 303 710 937 499 999 999 999 999 790 284 8;
  • 22) 0.559 999 999 997 671 693 563 461 303 710 937 499 999 999 999 999 790 284 8 × 2 = 1 + 0.119 999 999 995 343 387 126 922 607 421 874 999 999 999 999 999 580 569 6;
  • 23) 0.119 999 999 995 343 387 126 922 607 421 874 999 999 999 999 999 580 569 6 × 2 = 0 + 0.239 999 999 990 686 774 253 845 214 843 749 999 999 999 999 999 161 139 2;
  • 24) 0.239 999 999 990 686 774 253 845 214 843 749 999 999 999 999 999 161 139 2 × 2 = 0 + 0.479 999 999 981 373 548 507 690 429 687 499 999 999 999 999 998 322 278 4;
  • 25) 0.479 999 999 981 373 548 507 690 429 687 499 999 999 999 999 998 322 278 4 × 2 = 0 + 0.959 999 999 962 747 097 015 380 859 374 999 999 999 999 999 996 644 556 8;
  • 26) 0.959 999 999 962 747 097 015 380 859 374 999 999 999 999 999 996 644 556 8 × 2 = 1 + 0.919 999 999 925 494 194 030 761 718 749 999 999 999 999 999 993 289 113 6;
  • 27) 0.919 999 999 925 494 194 030 761 718 749 999 999 999 999 999 993 289 113 6 × 2 = 1 + 0.839 999 999 850 988 388 061 523 437 499 999 999 999 999 999 986 578 227 2;
  • 28) 0.839 999 999 850 988 388 061 523 437 499 999 999 999 999 999 986 578 227 2 × 2 = 1 + 0.679 999 999 701 976 776 123 046 874 999 999 999 999 999 999 973 156 454 4;
  • 29) 0.679 999 999 701 976 776 123 046 874 999 999 999 999 999 999 973 156 454 4 × 2 = 1 + 0.359 999 999 403 953 552 246 093 749 999 999 999 999 999 999 946 312 908 8;
  • 30) 0.359 999 999 403 953 552 246 093 749 999 999 999 999 999 999 946 312 908 8 × 2 = 0 + 0.719 999 998 807 907 104 492 187 499 999 999 999 999 999 999 892 625 817 6;
  • 31) 0.719 999 998 807 907 104 492 187 499 999 999 999 999 999 999 892 625 817 6 × 2 = 1 + 0.439 999 997 615 814 208 984 374 999 999 999 999 999 999 999 785 251 635 2;
  • 32) 0.439 999 997 615 814 208 984 374 999 999 999 999 999 999 999 785 251 635 2 × 2 = 0 + 0.879 999 995 231 628 417 968 749 999 999 999 999 999 999 999 570 503 270 4;
  • 33) 0.879 999 995 231 628 417 968 749 999 999 999 999 999 999 999 570 503 270 4 × 2 = 1 + 0.759 999 990 463 256 835 937 499 999 999 999 999 999 999 999 141 006 540 8;
  • 34) 0.759 999 990 463 256 835 937 499 999 999 999 999 999 999 999 141 006 540 8 × 2 = 1 + 0.519 999 980 926 513 671 874 999 999 999 999 999 999 999 998 282 013 081 6;
  • 35) 0.519 999 980 926 513 671 874 999 999 999 999 999 999 999 998 282 013 081 6 × 2 = 1 + 0.039 999 961 853 027 343 749 999 999 999 999 999 999 999 996 564 026 163 2;
  • 36) 0.039 999 961 853 027 343 749 999 999 999 999 999 999 999 996 564 026 163 2 × 2 = 0 + 0.079 999 923 706 054 687 499 999 999 999 999 999 999 999 993 128 052 326 4;
  • 37) 0.079 999 923 706 054 687 499 999 999 999 999 999 999 999 993 128 052 326 4 × 2 = 0 + 0.159 999 847 412 109 374 999 999 999 999 999 999 999 999 986 256 104 652 8;
  • 38) 0.159 999 847 412 109 374 999 999 999 999 999 999 999 999 986 256 104 652 8 × 2 = 0 + 0.319 999 694 824 218 749 999 999 999 999 999 999 999 999 972 512 209 305 6;
  • 39) 0.319 999 694 824 218 749 999 999 999 999 999 999 999 999 972 512 209 305 6 × 2 = 0 + 0.639 999 389 648 437 499 999 999 999 999 999 999 999 999 945 024 418 611 2;
  • 40) 0.639 999 389 648 437 499 999 999 999 999 999 999 999 999 945 024 418 611 2 × 2 = 1 + 0.279 998 779 296 874 999 999 999 999 999 999 999 999 999 890 048 837 222 4;
  • 41) 0.279 998 779 296 874 999 999 999 999 999 999 999 999 999 890 048 837 222 4 × 2 = 0 + 0.559 997 558 593 749 999 999 999 999 999 999 999 999 999 780 097 674 444 8;
  • 42) 0.559 997 558 593 749 999 999 999 999 999 999 999 999 999 780 097 674 444 8 × 2 = 1 + 0.119 995 117 187 499 999 999 999 999 999 999 999 999 999 560 195 348 889 6;
  • 43) 0.119 995 117 187 499 999 999 999 999 999 999 999 999 999 560 195 348 889 6 × 2 = 0 + 0.239 990 234 374 999 999 999 999 999 999 999 999 999 999 120 390 697 779 2;
  • 44) 0.239 990 234 374 999 999 999 999 999 999 999 999 999 999 120 390 697 779 2 × 2 = 0 + 0.479 980 468 749 999 999 999 999 999 999 999 999 999 998 240 781 395 558 4;
  • 45) 0.479 980 468 749 999 999 999 999 999 999 999 999 999 998 240 781 395 558 4 × 2 = 0 + 0.959 960 937 499 999 999 999 999 999 999 999 999 999 996 481 562 791 116 8;
  • 46) 0.959 960 937 499 999 999 999 999 999 999 999 999 999 996 481 562 791 116 8 × 2 = 1 + 0.919 921 874 999 999 999 999 999 999 999 999 999 999 992 963 125 582 233 6;
  • 47) 0.919 921 874 999 999 999 999 999 999 999 999 999 999 992 963 125 582 233 6 × 2 = 1 + 0.839 843 749 999 999 999 999 999 999 999 999 999 999 985 926 251 164 467 2;
  • 48) 0.839 843 749 999 999 999 999 999 999 999 999 999 999 985 926 251 164 467 2 × 2 = 1 + 0.679 687 499 999 999 999 999 999 999 999 999 999 999 971 852 502 328 934 4;
  • 49) 0.679 687 499 999 999 999 999 999 999 999 999 999 999 971 852 502 328 934 4 × 2 = 1 + 0.359 374 999 999 999 999 999 999 999 999 999 999 999 943 705 004 657 868 8;
  • 50) 0.359 374 999 999 999 999 999 999 999 999 999 999 999 943 705 004 657 868 8 × 2 = 0 + 0.718 749 999 999 999 999 999 999 999 999 999 999 999 887 410 009 315 737 6;
  • 51) 0.718 749 999 999 999 999 999 999 999 999 999 999 999 887 410 009 315 737 6 × 2 = 1 + 0.437 499 999 999 999 999 999 999 999 999 999 999 999 774 820 018 631 475 2;
  • 52) 0.437 499 999 999 999 999 999 999 999 999 999 999 999 774 820 018 631 475 2 × 2 = 0 + 0.874 999 999 999 999 999 999 999 999 999 999 999 999 549 640 037 262 950 4;
  • 53) 0.874 999 999 999 999 999 999 999 999 999 999 999 999 549 640 037 262 950 4 × 2 = 1 + 0.749 999 999 999 999 999 999 999 999 999 999 999 999 099 280 074 525 900 8;
  • 54) 0.749 999 999 999 999 999 999 999 999 999 999 999 999 099 280 074 525 900 8 × 2 = 1 + 0.499 999 999 999 999 999 999 999 999 999 999 999 998 198 560 149 051 801 6;
  • 55) 0.499 999 999 999 999 999 999 999 999 999 999 999 998 198 560 149 051 801 6 × 2 = 0 + 0.999 999 999 999 999 999 999 999 999 999 999 999 996 397 120 298 103 603 2;
  • 56) 0.999 999 999 999 999 999 999 999 999 999 999 999 996 397 120 298 103 603 2 × 2 = 1 + 0.999 999 999 999 999 999 999 999 999 999 999 999 992 794 240 596 207 206 4;
  • 57) 0.999 999 999 999 999 999 999 999 999 999 999 999 992 794 240 596 207 206 4 × 2 = 1 + 0.999 999 999 999 999 999 999 999 999 999 999 999 985 588 481 192 414 412 8;
  • 58) 0.999 999 999 999 999 999 999 999 999 999 999 999 985 588 481 192 414 412 8 × 2 = 1 + 0.999 999 999 999 999 999 999 999 999 999 999 999 971 176 962 384 828 825 6;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (losing precision...)

4. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the constructed list above:

0.029 999 999 999 999 998 889 776 975 374 843 459 576 368 331 909 179 687 4(10) =


0.0000 0111 1010 1110 0001 0100 0111 1010 1110 0001 0100 0111 1010 1101 11(2)

Positive number before normalization:

0.029 999 999 999 999 998 889 776 975 374 843 459 576 368 331 909 179 687 4(10) =


0.0000 0111 1010 1110 0001 0100 0111 1010 1110 0001 0100 0111 1010 1101 11(2)

5. Normalize the binary representation of the number, shifting the decimal mark 6 positions to the right so that only one non zero digit remains to the left of it:

0.029 999 999 999 999 998 889 776 975 374 843 459 576 368 331 909 179 687 4(10) =


0.0000 0111 1010 1110 0001 0100 0111 1010 1110 0001 0100 0111 1010 1101 11(2) =


0.0000 0111 1010 1110 0001 0100 0111 1010 1110 0001 0100 0111 1010 1101 11(2) × 20 =


1.1110 1011 1000 0101 0001 1110 1011 1000 0101 0001 1110 1011 0111(2) × 2-6

Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign: 0 (a positive number)


Exponent (unadjusted): -6


Mantissa (not normalized): 1.1110 1011 1000 0101 0001 1110 1011 1000 0101 0001 1110 1011 0111

6. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2:

Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-6 + 2(11-1) - 1 =


(-6 + 1 023)(10) =


1 017(10)


  • division = quotient + remainder;
  • 1 017 ÷ 2 = 508 + 1;
  • 508 ÷ 2 = 254 + 0;
  • 254 ÷ 2 = 127 + 0;
  • 127 ÷ 2 = 63 + 1;
  • 63 ÷ 2 = 31 + 1;
  • 31 ÷ 2 = 15 + 1;
  • 15 ÷ 2 = 7 + 1;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

Exponent (adjusted) =


1017(10) =


011 1111 1001(2)

7. Normalize mantissa, remove the leading (the leftmost) bit, since it's allways 1 (and the decimal point, if the case) then adjust its length to 52 bits, only if necessary (not the case here):

Mantissa (normalized) =


1. 1110 1011 1000 0101 0001 1110 1011 1000 0101 0001 1110 1011 0111 =


1110 1011 1000 0101 0001 1110 1011 1000 0101 0001 1110 1011 0111

Conclusion:

The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1111 1001


Mantissa (52 bits) =
1110 1011 1000 0101 0001 1110 1011 1000 0101 0001 1110 1011 0111

Number 0.029 999 999 999 999 998 889 776 975 374 843 459 576 368 331 909 179 687 4 converted from decimal system (base 10)
to
64 bit double precision IEEE 754 binary floating point:
0 - 011 1111 1001 - 1110 1011 1000 0101 0001 1110 1011 1000 0101 0001 1110 1011 0111

(64 bits IEEE 754)
  • Sign (1 bit):

    • 0

      63
  • Exponent (11 bits):

    • 0

      62
    • 1

      61
    • 1

      60
    • 1

      59
    • 1

      58
    • 1

      57
    • 1

      56
    • 1

      55
    • 0

      54
    • 0

      53
    • 1

      52
  • Mantissa (52 bits):

    • 1

      51
    • 1

      50
    • 1

      49
    • 0

      48
    • 1

      47
    • 0

      46
    • 1

      45
    • 1

      44
    • 1

      43
    • 0

      42
    • 0

      41
    • 0

      40
    • 0

      39
    • 1

      38
    • 0

      37
    • 1

      36
    • 0

      35
    • 0

      34
    • 0

      33
    • 1

      32
    • 1

      31
    • 1

      30
    • 1

      29
    • 0

      28
    • 1

      27
    • 0

      26
    • 1

      25
    • 1

      24
    • 1

      23
    • 0

      22
    • 0

      21
    • 0

      20
    • 0

      19
    • 1

      18
    • 0

      17
    • 1

      16
    • 0

      15
    • 0

      14
    • 0

      13
    • 1

      12
    • 1

      11
    • 1

      10
    • 1

      9
    • 0

      8
    • 1

      7
    • 0

      6
    • 1

      5
    • 1

      4
    • 0

      3
    • 1

      2
    • 1

      1
    • 1

      0

0.029 999 999 999 999 998 889 776 975 374 843 459 576 368 331 909 179 687 3 = ? ... 0.029 999 999 999 999 998 889 776 975 374 843 459 576 368 331 909 179 687 5 = ?


Convert to 64 bit double precision IEEE 754 binary floating point standard

A number in 64 bit double precision IEEE 754 binary floating point standard representation requires three building elements: sign (it takes one bit and it's either 0 for positive or 1 for negative numbers), exponent (11 bits), mantissa (52 bits)

Latest decimal numbers converted from base ten to 64 bit double precision IEEE 754 floating point binary standard representation

How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =


    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100