Base ten decimal number 1 000 001 001 000 000 000 000 000 000 000 converted to 32 bit single precision IEEE 754 binary floating point standard

How to convert the decimal number 1 000 001 001 000 000 000 000 000 000 000(10)
to
32 bit single precision IEEE 754 binary floating point
(1 bit for sign, 8 bits for exponent, 23 bits for mantissa)

1. Divide the number repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:

  • division = quotient + remainder;
  • 1 000 001 001 000 000 000 000 000 000 000 ÷ 2 = 500 000 500 500 000 000 000 000 000 000 + 0;
  • 500 000 500 500 000 000 000 000 000 000 ÷ 2 = 250 000 250 250 000 000 000 000 000 000 + 0;
  • 250 000 250 250 000 000 000 000 000 000 ÷ 2 = 125 000 125 125 000 000 000 000 000 000 + 0;
  • 125 000 125 125 000 000 000 000 000 000 ÷ 2 = 62 500 062 562 500 000 000 000 000 000 + 0;
  • 62 500 062 562 500 000 000 000 000 000 ÷ 2 = 31 250 031 281 250 000 000 000 000 000 + 0;
  • 31 250 031 281 250 000 000 000 000 000 ÷ 2 = 15 625 015 640 625 000 000 000 000 000 + 0;
  • 15 625 015 640 625 000 000 000 000 000 ÷ 2 = 7 812 507 820 312 500 000 000 000 000 + 0;
  • 7 812 507 820 312 500 000 000 000 000 ÷ 2 = 3 906 253 910 156 250 000 000 000 000 + 0;
  • 3 906 253 910 156 250 000 000 000 000 ÷ 2 = 1 953 126 955 078 125 000 000 000 000 + 0;
  • 1 953 126 955 078 125 000 000 000 000 ÷ 2 = 976 563 477 539 062 500 000 000 000 + 0;
  • 976 563 477 539 062 500 000 000 000 ÷ 2 = 488 281 738 769 531 250 000 000 000 + 0;
  • 488 281 738 769 531 250 000 000 000 ÷ 2 = 244 140 869 384 765 625 000 000 000 + 0;
  • 244 140 869 384 765 625 000 000 000 ÷ 2 = 122 070 434 692 382 812 500 000 000 + 0;
  • 122 070 434 692 382 812 500 000 000 ÷ 2 = 61 035 217 346 191 406 250 000 000 + 0;
  • 61 035 217 346 191 406 250 000 000 ÷ 2 = 30 517 608 673 095 703 125 000 000 + 0;
  • 30 517 608 673 095 703 125 000 000 ÷ 2 = 15 258 804 336 547 851 562 500 000 + 0;
  • 15 258 804 336 547 851 562 500 000 ÷ 2 = 7 629 402 168 273 925 781 250 000 + 0;
  • 7 629 402 168 273 925 781 250 000 ÷ 2 = 3 814 701 084 136 962 890 625 000 + 0;
  • 3 814 701 084 136 962 890 625 000 ÷ 2 = 1 907 350 542 068 481 445 312 500 + 0;
  • 1 907 350 542 068 481 445 312 500 ÷ 2 = 953 675 271 034 240 722 656 250 + 0;
  • 953 675 271 034 240 722 656 250 ÷ 2 = 476 837 635 517 120 361 328 125 + 0;
  • 476 837 635 517 120 361 328 125 ÷ 2 = 238 418 817 758 560 180 664 062 + 1;
  • 238 418 817 758 560 180 664 062 ÷ 2 = 119 209 408 879 280 090 332 031 + 0;
  • 119 209 408 879 280 090 332 031 ÷ 2 = 59 604 704 439 640 045 166 015 + 1;
  • 59 604 704 439 640 045 166 015 ÷ 2 = 29 802 352 219 820 022 583 007 + 1;
  • 29 802 352 219 820 022 583 007 ÷ 2 = 14 901 176 109 910 011 291 503 + 1;
  • 14 901 176 109 910 011 291 503 ÷ 2 = 7 450 588 054 955 005 645 751 + 1;
  • 7 450 588 054 955 005 645 751 ÷ 2 = 3 725 294 027 477 502 822 875 + 1;
  • 3 725 294 027 477 502 822 875 ÷ 2 = 1 862 647 013 738 751 411 437 + 1;
  • 1 862 647 013 738 751 411 437 ÷ 2 = 931 323 506 869 375 705 718 + 1;
  • 931 323 506 869 375 705 718 ÷ 2 = 465 661 753 434 687 852 859 + 0;
  • 465 661 753 434 687 852 859 ÷ 2 = 232 830 876 717 343 926 429 + 1;
  • 232 830 876 717 343 926 429 ÷ 2 = 116 415 438 358 671 963 214 + 1;
  • 116 415 438 358 671 963 214 ÷ 2 = 58 207 719 179 335 981 607 + 0;
  • 58 207 719 179 335 981 607 ÷ 2 = 29 103 859 589 667 990 803 + 1;
  • 29 103 859 589 667 990 803 ÷ 2 = 14 551 929 794 833 995 401 + 1;
  • 14 551 929 794 833 995 401 ÷ 2 = 7 275 964 897 416 997 700 + 1;
  • 7 275 964 897 416 997 700 ÷ 2 = 3 637 982 448 708 498 850 + 0;
  • 3 637 982 448 708 498 850 ÷ 2 = 1 818 991 224 354 249 425 + 0;
  • 1 818 991 224 354 249 425 ÷ 2 = 909 495 612 177 124 712 + 1;
  • 909 495 612 177 124 712 ÷ 2 = 454 747 806 088 562 356 + 0;
  • 454 747 806 088 562 356 ÷ 2 = 227 373 903 044 281 178 + 0;
  • 227 373 903 044 281 178 ÷ 2 = 113 686 951 522 140 589 + 0;
  • 113 686 951 522 140 589 ÷ 2 = 56 843 475 761 070 294 + 1;
  • 56 843 475 761 070 294 ÷ 2 = 28 421 737 880 535 147 + 0;
  • 28 421 737 880 535 147 ÷ 2 = 14 210 868 940 267 573 + 1;
  • 14 210 868 940 267 573 ÷ 2 = 7 105 434 470 133 786 + 1;
  • 7 105 434 470 133 786 ÷ 2 = 3 552 717 235 066 893 + 0;
  • 3 552 717 235 066 893 ÷ 2 = 1 776 358 617 533 446 + 1;
  • 1 776 358 617 533 446 ÷ 2 = 888 179 308 766 723 + 0;
  • 888 179 308 766 723 ÷ 2 = 444 089 654 383 361 + 1;
  • 444 089 654 383 361 ÷ 2 = 222 044 827 191 680 + 1;
  • 222 044 827 191 680 ÷ 2 = 111 022 413 595 840 + 0;
  • 111 022 413 595 840 ÷ 2 = 55 511 206 797 920 + 0;
  • 55 511 206 797 920 ÷ 2 = 27 755 603 398 960 + 0;
  • 27 755 603 398 960 ÷ 2 = 13 877 801 699 480 + 0;
  • 13 877 801 699 480 ÷ 2 = 6 938 900 849 740 + 0;
  • 6 938 900 849 740 ÷ 2 = 3 469 450 424 870 + 0;
  • 3 469 450 424 870 ÷ 2 = 1 734 725 212 435 + 0;
  • 1 734 725 212 435 ÷ 2 = 867 362 606 217 + 1;
  • 867 362 606 217 ÷ 2 = 433 681 303 108 + 1;
  • 433 681 303 108 ÷ 2 = 216 840 651 554 + 0;
  • 216 840 651 554 ÷ 2 = 108 420 325 777 + 0;
  • 108 420 325 777 ÷ 2 = 54 210 162 888 + 1;
  • 54 210 162 888 ÷ 2 = 27 105 081 444 + 0;
  • 27 105 081 444 ÷ 2 = 13 552 540 722 + 0;
  • 13 552 540 722 ÷ 2 = 6 776 270 361 + 0;
  • 6 776 270 361 ÷ 2 = 3 388 135 180 + 1;
  • 3 388 135 180 ÷ 2 = 1 694 067 590 + 0;
  • 1 694 067 590 ÷ 2 = 847 033 795 + 0;
  • 847 033 795 ÷ 2 = 423 516 897 + 1;
  • 423 516 897 ÷ 2 = 211 758 448 + 1;
  • 211 758 448 ÷ 2 = 105 879 224 + 0;
  • 105 879 224 ÷ 2 = 52 939 612 + 0;
  • 52 939 612 ÷ 2 = 26 469 806 + 0;
  • 26 469 806 ÷ 2 = 13 234 903 + 0;
  • 13 234 903 ÷ 2 = 6 617 451 + 1;
  • 6 617 451 ÷ 2 = 3 308 725 + 1;
  • 3 308 725 ÷ 2 = 1 654 362 + 1;
  • 1 654 362 ÷ 2 = 827 181 + 0;
  • 827 181 ÷ 2 = 413 590 + 1;
  • 413 590 ÷ 2 = 206 795 + 0;
  • 206 795 ÷ 2 = 103 397 + 1;
  • 103 397 ÷ 2 = 51 698 + 1;
  • 51 698 ÷ 2 = 25 849 + 0;
  • 25 849 ÷ 2 = 12 924 + 1;
  • 12 924 ÷ 2 = 6 462 + 0;
  • 6 462 ÷ 2 = 3 231 + 0;
  • 3 231 ÷ 2 = 1 615 + 1;
  • 1 615 ÷ 2 = 807 + 1;
  • 807 ÷ 2 = 403 + 1;
  • 403 ÷ 2 = 201 + 1;
  • 201 ÷ 2 = 100 + 1;
  • 100 ÷ 2 = 50 + 0;
  • 50 ÷ 2 = 25 + 0;
  • 25 ÷ 2 = 12 + 1;
  • 12 ÷ 2 = 6 + 0;
  • 6 ÷ 2 = 3 + 0;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

2. Construct the base 2 representation of the integer part of the number, by taking all the remainders starting from the bottom of the list constructed above:

1 000 001 001 000 000 000 000 000 000 000(10) =


1100 1001 1111 0010 1101 0111 0000 1100 1000 1001 1000 0000 1101 0110 1000 1001 1101 1011 1111 1010 0000 0000 0000 0000 0000(2)

3. Normalize the binary representation of the number, shifting the decimal mark 99 positions to the left so that only one non zero digit remains to the left of it:

1 000 001 001 000 000 000 000 000 000 000(10) =


1100 1001 1111 0010 1101 0111 0000 1100 1000 1001 1000 0000 1101 0110 1000 1001 1101 1011 1111 1010 0000 0000 0000 0000 0000(2) =


1100 1001 1111 0010 1101 0111 0000 1100 1000 1001 1000 0000 1101 0110 1000 1001 1101 1011 1111 1010 0000 0000 0000 0000 0000(2) × 20 =


1.1001 0011 1110 0101 1010 1110 0001 1001 0001 0011 0000 0001 1010 1101 0001 0011 1011 0111 1111 0100 0000 0000 0000 0000 000(2) × 299

Up to this moment, there are the following elements that would feed into the 32 bit single precision IEEE 754 binary floating point representation:

Sign: 0 (a positive number)


Exponent (unadjusted): 99


Mantissa (not normalized): 1.1001 0011 1110 0101 1010 1110 0001 1001 0001 0011 0000 0001 1010 1101 0001 0011 1011 0111 1111 0100 0000 0000 0000 0000 000

4. Adjust the exponent in 8 bit excess/bias notation and then convert it from decimal (base 10) to 8 bit binary, by using the same technique of repeatedly dividing by 2:

Exponent (adjusted) =


Exponent (unadjusted) + 2(8-1) - 1 =


99 + 2(8-1) - 1 =


(99 + 127)(10) =


226(10)


  • division = quotient + remainder;
  • 226 ÷ 2 = 113 + 0;
  • 113 ÷ 2 = 56 + 1;
  • 56 ÷ 2 = 28 + 0;
  • 28 ÷ 2 = 14 + 0;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

Exponent (adjusted) =


226(10) =


1110 0010(2)

5. Normalize mantissa, remove the leading (the leftmost) bit, since it's allways 1 (and the decimal point, if the case) then adjust its length to 23 bits, by removing the excess bits, from the right (if any of the excess bits is set on 1, we are losing precision...):

Mantissa (normalized) =


1. 100 1001 1111 0010 1101 0111 0000 1100 1000 1001 1000 0000 1101 0110 1000 1001 1101 1011 1111 1010 0000 0000 0000 0000 0000 =


100 1001 1111 0010 1101 0111

Conclusion:

The three elements that make up the number's 32 bit single precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (8 bits) =
1110 0010


Mantissa (23 bits) =
100 1001 1111 0010 1101 0111

Number 1 000 001 001 000 000 000 000 000 000 000, a decimal, converted from decimal system (base 10)
to
32 bit single precision IEEE 754 binary floating point:


0 - 1110 0010 - 100 1001 1111 0010 1101 0111

(32 bits IEEE 754)
  • Sign (1 bit):

    • 0

      31
  • Exponent (8 bits):

    • 1

      30
    • 1

      29
    • 1

      28
    • 0

      27
    • 0

      26
    • 0

      25
    • 1

      24
    • 0

      23
  • Mantissa (23 bits):

    • 1

      22
    • 0

      21
    • 0

      20
    • 1

      19
    • 0

      18
    • 0

      17
    • 1

      16
    • 1

      15
    • 1

      14
    • 1

      13
    • 1

      12
    • 0

      11
    • 0

      10
    • 1

      9
    • 0

      8
    • 1

      7
    • 1

      6
    • 0

      5
    • 1

      4
    • 0

      3
    • 1

      2
    • 1

      1
    • 1

      0

Convert decimal numbers from base ten to 32 bit single precision IEEE 754 binary floating point standard

A number in 32 bit single precision IEEE 754 binary floating point standard representation requires three building elements: sign (it takes 1 bit and it's either 0 for positive or 1 for negative numbers), exponent (8 bits), mantissa (23 bits)

Latest decimal numbers converted from base ten to 32 bit single precision IEEE 754 floating point binary standard representation

How to convert decimal numbers from base ten to 32 bit single precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 32 bit single precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the base ten positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, by shifting the decimal point (or if you prefer, the decimal mark) "n" positions either to the left or to the right, so that only one non zero digit remains to the left of the decimal point.
  • 7. Adjust the exponent in 8 bit excess/bias notation and then convert it from decimal (base 10) to 8 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(8-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign if the case) and adjust its length to 23 bits, either by removing the excess bits from the right (losing precision...) or by adding extra '0' bits to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -25.347 from decimal system (base ten) to 32 bit single precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-25.347| = 25.347

  • 2. First convert the integer part, 25. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 25 ÷ 2 = 12 + 1;
    • 12 ÷ 2 = 6 + 0;
    • 6 ÷ 2 = 3 + 0;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    25(10) = 1 1001(2)

  • 4. Then convert the fractional part, 0.347. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.347 × 2 = 0 + 0.694;
    • 2) 0.694 × 2 = 1 + 0.388;
    • 3) 0.388 × 2 = 0 + 0.776;
    • 4) 0.776 × 2 = 1 + 0.552;
    • 5) 0.552 × 2 = 1 + 0.104;
    • 6) 0.104 × 2 = 0 + 0.208;
    • 7) 0.208 × 2 = 0 + 0.416;
    • 8) 0.416 × 2 = 0 + 0.832;
    • 9) 0.832 × 2 = 1 + 0.664;
    • 10) 0.664 × 2 = 1 + 0.328;
    • 11) 0.328 × 2 = 0 + 0.656;
    • 12) 0.656 × 2 = 1 + 0.312;
    • 13) 0.312 × 2 = 0 + 0.624;
    • 14) 0.624 × 2 = 1 + 0.248;
    • 15) 0.248 × 2 = 0 + 0.496;
    • 16) 0.496 × 2 = 0 + 0.992;
    • 17) 0.992 × 2 = 1 + 0.984;
    • 18) 0.984 × 2 = 1 + 0.968;
    • 19) 0.968 × 2 = 1 + 0.936;
    • 20) 0.936 × 2 = 1 + 0.872;
    • 21) 0.872 × 2 = 1 + 0.744;
    • 22) 0.744 × 2 = 1 + 0.488;
    • 23) 0.488 × 2 = 0 + 0.976;
    • 24) 0.976 × 2 = 1 + 0.952;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 23) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.347(10) = 0.0101 1000 1101 0100 1111 1101(2)

  • 6. Summarizing - the positive number before normalization:

    25.347(10) = 1 1001.0101 1000 1101 0100 1111 1101(2)

  • 7. Normalize the binary representation of the number, shifting the decimal point 4 positions to the left so that only one non-zero digit stays to the left of the decimal point:

    25.347(10) =
    1 1001.0101 1000 1101 0100 1111 1101(2) =
    1 1001.0101 1000 1101 0100 1111 1101(2) × 20 =
    1.1001 0101 1000 1101 0100 1111 1101(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 32 bit single precision IEEE 754 binary floating point:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1001 0101 1000 1101 0100 1111 1101

  • 9. Adjust the exponent in 8 bit excess/bias notation and then convert it from decimal (base 10) to 8 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as already demonstrated above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(8-1) - 1 = (4 + 127)(10) = 131(10) =
    1000 0011(2)

  • 10. Normalize the mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal point) and adjust its length to 23 bits, by removing the excess bits from the right (losing precision...):

    Mantissa (not-normalized): 1.1001 0101 1000 1101 0100 1111 1101

    Mantissa (normalized): 100 1010 1100 0110 1010 0111

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 1000 0011

    Mantissa (23 bits) = 100 1010 1100 0110 1010 0111

  • Number -25.347, converted from the decimal system (base 10) to 32 bit single precision IEEE 754 binary floating point =


    1 - 1000 0011 - 100 1010 1100 0110 1010 0111