11 000 010 001 099 999 999 999 999 999 973 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 11 000 010 001 099 999 999 999 999 999 973(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
11 000 010 001 099 999 999 999 999 999 973(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 11 000 010 001 099 999 999 999 999 999 973 ÷ 2 = 5 500 005 000 549 999 999 999 999 999 986 + 1;
  • 5 500 005 000 549 999 999 999 999 999 986 ÷ 2 = 2 750 002 500 274 999 999 999 999 999 993 + 0;
  • 2 750 002 500 274 999 999 999 999 999 993 ÷ 2 = 1 375 001 250 137 499 999 999 999 999 996 + 1;
  • 1 375 001 250 137 499 999 999 999 999 996 ÷ 2 = 687 500 625 068 749 999 999 999 999 998 + 0;
  • 687 500 625 068 749 999 999 999 999 998 ÷ 2 = 343 750 312 534 374 999 999 999 999 999 + 0;
  • 343 750 312 534 374 999 999 999 999 999 ÷ 2 = 171 875 156 267 187 499 999 999 999 999 + 1;
  • 171 875 156 267 187 499 999 999 999 999 ÷ 2 = 85 937 578 133 593 749 999 999 999 999 + 1;
  • 85 937 578 133 593 749 999 999 999 999 ÷ 2 = 42 968 789 066 796 874 999 999 999 999 + 1;
  • 42 968 789 066 796 874 999 999 999 999 ÷ 2 = 21 484 394 533 398 437 499 999 999 999 + 1;
  • 21 484 394 533 398 437 499 999 999 999 ÷ 2 = 10 742 197 266 699 218 749 999 999 999 + 1;
  • 10 742 197 266 699 218 749 999 999 999 ÷ 2 = 5 371 098 633 349 609 374 999 999 999 + 1;
  • 5 371 098 633 349 609 374 999 999 999 ÷ 2 = 2 685 549 316 674 804 687 499 999 999 + 1;
  • 2 685 549 316 674 804 687 499 999 999 ÷ 2 = 1 342 774 658 337 402 343 749 999 999 + 1;
  • 1 342 774 658 337 402 343 749 999 999 ÷ 2 = 671 387 329 168 701 171 874 999 999 + 1;
  • 671 387 329 168 701 171 874 999 999 ÷ 2 = 335 693 664 584 350 585 937 499 999 + 1;
  • 335 693 664 584 350 585 937 499 999 ÷ 2 = 167 846 832 292 175 292 968 749 999 + 1;
  • 167 846 832 292 175 292 968 749 999 ÷ 2 = 83 923 416 146 087 646 484 374 999 + 1;
  • 83 923 416 146 087 646 484 374 999 ÷ 2 = 41 961 708 073 043 823 242 187 499 + 1;
  • 41 961 708 073 043 823 242 187 499 ÷ 2 = 20 980 854 036 521 911 621 093 749 + 1;
  • 20 980 854 036 521 911 621 093 749 ÷ 2 = 10 490 427 018 260 955 810 546 874 + 1;
  • 10 490 427 018 260 955 810 546 874 ÷ 2 = 5 245 213 509 130 477 905 273 437 + 0;
  • 5 245 213 509 130 477 905 273 437 ÷ 2 = 2 622 606 754 565 238 952 636 718 + 1;
  • 2 622 606 754 565 238 952 636 718 ÷ 2 = 1 311 303 377 282 619 476 318 359 + 0;
  • 1 311 303 377 282 619 476 318 359 ÷ 2 = 655 651 688 641 309 738 159 179 + 1;
  • 655 651 688 641 309 738 159 179 ÷ 2 = 327 825 844 320 654 869 079 589 + 1;
  • 327 825 844 320 654 869 079 589 ÷ 2 = 163 912 922 160 327 434 539 794 + 1;
  • 163 912 922 160 327 434 539 794 ÷ 2 = 81 956 461 080 163 717 269 897 + 0;
  • 81 956 461 080 163 717 269 897 ÷ 2 = 40 978 230 540 081 858 634 948 + 1;
  • 40 978 230 540 081 858 634 948 ÷ 2 = 20 489 115 270 040 929 317 474 + 0;
  • 20 489 115 270 040 929 317 474 ÷ 2 = 10 244 557 635 020 464 658 737 + 0;
  • 10 244 557 635 020 464 658 737 ÷ 2 = 5 122 278 817 510 232 329 368 + 1;
  • 5 122 278 817 510 232 329 368 ÷ 2 = 2 561 139 408 755 116 164 684 + 0;
  • 2 561 139 408 755 116 164 684 ÷ 2 = 1 280 569 704 377 558 082 342 + 0;
  • 1 280 569 704 377 558 082 342 ÷ 2 = 640 284 852 188 779 041 171 + 0;
  • 640 284 852 188 779 041 171 ÷ 2 = 320 142 426 094 389 520 585 + 1;
  • 320 142 426 094 389 520 585 ÷ 2 = 160 071 213 047 194 760 292 + 1;
  • 160 071 213 047 194 760 292 ÷ 2 = 80 035 606 523 597 380 146 + 0;
  • 80 035 606 523 597 380 146 ÷ 2 = 40 017 803 261 798 690 073 + 0;
  • 40 017 803 261 798 690 073 ÷ 2 = 20 008 901 630 899 345 036 + 1;
  • 20 008 901 630 899 345 036 ÷ 2 = 10 004 450 815 449 672 518 + 0;
  • 10 004 450 815 449 672 518 ÷ 2 = 5 002 225 407 724 836 259 + 0;
  • 5 002 225 407 724 836 259 ÷ 2 = 2 501 112 703 862 418 129 + 1;
  • 2 501 112 703 862 418 129 ÷ 2 = 1 250 556 351 931 209 064 + 1;
  • 1 250 556 351 931 209 064 ÷ 2 = 625 278 175 965 604 532 + 0;
  • 625 278 175 965 604 532 ÷ 2 = 312 639 087 982 802 266 + 0;
  • 312 639 087 982 802 266 ÷ 2 = 156 319 543 991 401 133 + 0;
  • 156 319 543 991 401 133 ÷ 2 = 78 159 771 995 700 566 + 1;
  • 78 159 771 995 700 566 ÷ 2 = 39 079 885 997 850 283 + 0;
  • 39 079 885 997 850 283 ÷ 2 = 19 539 942 998 925 141 + 1;
  • 19 539 942 998 925 141 ÷ 2 = 9 769 971 499 462 570 + 1;
  • 9 769 971 499 462 570 ÷ 2 = 4 884 985 749 731 285 + 0;
  • 4 884 985 749 731 285 ÷ 2 = 2 442 492 874 865 642 + 1;
  • 2 442 492 874 865 642 ÷ 2 = 1 221 246 437 432 821 + 0;
  • 1 221 246 437 432 821 ÷ 2 = 610 623 218 716 410 + 1;
  • 610 623 218 716 410 ÷ 2 = 305 311 609 358 205 + 0;
  • 305 311 609 358 205 ÷ 2 = 152 655 804 679 102 + 1;
  • 152 655 804 679 102 ÷ 2 = 76 327 902 339 551 + 0;
  • 76 327 902 339 551 ÷ 2 = 38 163 951 169 775 + 1;
  • 38 163 951 169 775 ÷ 2 = 19 081 975 584 887 + 1;
  • 19 081 975 584 887 ÷ 2 = 9 540 987 792 443 + 1;
  • 9 540 987 792 443 ÷ 2 = 4 770 493 896 221 + 1;
  • 4 770 493 896 221 ÷ 2 = 2 385 246 948 110 + 1;
  • 2 385 246 948 110 ÷ 2 = 1 192 623 474 055 + 0;
  • 1 192 623 474 055 ÷ 2 = 596 311 737 027 + 1;
  • 596 311 737 027 ÷ 2 = 298 155 868 513 + 1;
  • 298 155 868 513 ÷ 2 = 149 077 934 256 + 1;
  • 149 077 934 256 ÷ 2 = 74 538 967 128 + 0;
  • 74 538 967 128 ÷ 2 = 37 269 483 564 + 0;
  • 37 269 483 564 ÷ 2 = 18 634 741 782 + 0;
  • 18 634 741 782 ÷ 2 = 9 317 370 891 + 0;
  • 9 317 370 891 ÷ 2 = 4 658 685 445 + 1;
  • 4 658 685 445 ÷ 2 = 2 329 342 722 + 1;
  • 2 329 342 722 ÷ 2 = 1 164 671 361 + 0;
  • 1 164 671 361 ÷ 2 = 582 335 680 + 1;
  • 582 335 680 ÷ 2 = 291 167 840 + 0;
  • 291 167 840 ÷ 2 = 145 583 920 + 0;
  • 145 583 920 ÷ 2 = 72 791 960 + 0;
  • 72 791 960 ÷ 2 = 36 395 980 + 0;
  • 36 395 980 ÷ 2 = 18 197 990 + 0;
  • 18 197 990 ÷ 2 = 9 098 995 + 0;
  • 9 098 995 ÷ 2 = 4 549 497 + 1;
  • 4 549 497 ÷ 2 = 2 274 748 + 1;
  • 2 274 748 ÷ 2 = 1 137 374 + 0;
  • 1 137 374 ÷ 2 = 568 687 + 0;
  • 568 687 ÷ 2 = 284 343 + 1;
  • 284 343 ÷ 2 = 142 171 + 1;
  • 142 171 ÷ 2 = 71 085 + 1;
  • 71 085 ÷ 2 = 35 542 + 1;
  • 35 542 ÷ 2 = 17 771 + 0;
  • 17 771 ÷ 2 = 8 885 + 1;
  • 8 885 ÷ 2 = 4 442 + 1;
  • 4 442 ÷ 2 = 2 221 + 0;
  • 2 221 ÷ 2 = 1 110 + 1;
  • 1 110 ÷ 2 = 555 + 0;
  • 555 ÷ 2 = 277 + 1;
  • 277 ÷ 2 = 138 + 1;
  • 138 ÷ 2 = 69 + 0;
  • 69 ÷ 2 = 34 + 1;
  • 34 ÷ 2 = 17 + 0;
  • 17 ÷ 2 = 8 + 1;
  • 8 ÷ 2 = 4 + 0;
  • 4 ÷ 2 = 2 + 0;
  • 2 ÷ 2 = 1 + 0;
  • 1 ÷ 2 = 0 + 1;

2. Construct the base 2 representation of the positive number.

Take all the remainders starting from the bottom of the list constructed above.

11 000 010 001 099 999 999 999 999 999 973(10) =


1000 1010 1101 0110 1111 0011 0000 0010 1100 0011 1011 1110 1010 1011 0100 0110 0100 1100 0100 1011 1010 1111 1111 1111 1110 0101(2)


3. Normalize the binary representation of the number.

Shift the decimal mark 103 positions to the left, so that only one non zero digit remains to the left of it:


11 000 010 001 099 999 999 999 999 999 973(10) =


1000 1010 1101 0110 1111 0011 0000 0010 1100 0011 1011 1110 1010 1011 0100 0110 0100 1100 0100 1011 1010 1111 1111 1111 1110 0101(2) =


1000 1010 1101 0110 1111 0011 0000 0010 1100 0011 1011 1110 1010 1011 0100 0110 0100 1100 0100 1011 1010 1111 1111 1111 1110 0101(2) × 20 =


1.0001 0101 1010 1101 1110 0110 0000 0101 1000 0111 0111 1101 0101 0110 1000 1100 1001 1000 1001 0111 0101 1111 1111 1111 1100 101(2) × 2103


4. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): 103


Mantissa (not normalized):
1.0001 0101 1010 1101 1110 0110 0000 0101 1000 0111 0111 1101 0101 0110 1000 1100 1001 1000 1001 0111 0101 1111 1111 1111 1100 101


5. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


103 + 2(11-1) - 1 =


(103 + 1 023)(10) =


1 126(10)


6. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 1 126 ÷ 2 = 563 + 0;
  • 563 ÷ 2 = 281 + 1;
  • 281 ÷ 2 = 140 + 1;
  • 140 ÷ 2 = 70 + 0;
  • 70 ÷ 2 = 35 + 0;
  • 35 ÷ 2 = 17 + 1;
  • 17 ÷ 2 = 8 + 1;
  • 8 ÷ 2 = 4 + 0;
  • 4 ÷ 2 = 2 + 0;
  • 2 ÷ 2 = 1 + 0;
  • 1 ÷ 2 = 0 + 1;

7. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


1126(10) =


100 0110 0110(2)


8. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, by removing the excess bits, from the right (if any of the excess bits is set on 1, we are losing precision...).


Mantissa (normalized) =


1. 0001 0101 1010 1101 1110 0110 0000 0101 1000 0111 0111 1101 0101 011 0100 0110 0100 1100 0100 1011 1010 1111 1111 1111 1110 0101 =


0001 0101 1010 1101 1110 0110 0000 0101 1000 0111 0111 1101 0101


9. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
100 0110 0110


Mantissa (52 bits) =
0001 0101 1010 1101 1110 0110 0000 0101 1000 0111 0111 1101 0101


Decimal number 11 000 010 001 099 999 999 999 999 999 973 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 100 0110 0110 - 0001 0101 1010 1101 1110 0110 0000 0101 1000 0111 0111 1101 0101


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100