Base ten decimal number 1 000 010 001 010 000 000 000 000 000 000 converted to 32 bit single precision IEEE 754 binary floating point standard

How to convert the decimal number 1 000 010 001 010 000 000 000 000 000 000(10)
to
32 bit single precision IEEE 754 binary floating point
(1 bit for sign, 8 bits for exponent, 23 bits for mantissa)

1. Divide the number repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:

  • division = quotient + remainder;
  • 1 000 010 001 010 000 000 000 000 000 000 ÷ 2 = 500 005 000 505 000 000 000 000 000 000 + 0;
  • 500 005 000 505 000 000 000 000 000 000 ÷ 2 = 250 002 500 252 500 000 000 000 000 000 + 0;
  • 250 002 500 252 500 000 000 000 000 000 ÷ 2 = 125 001 250 126 250 000 000 000 000 000 + 0;
  • 125 001 250 126 250 000 000 000 000 000 ÷ 2 = 62 500 625 063 125 000 000 000 000 000 + 0;
  • 62 500 625 063 125 000 000 000 000 000 ÷ 2 = 31 250 312 531 562 500 000 000 000 000 + 0;
  • 31 250 312 531 562 500 000 000 000 000 ÷ 2 = 15 625 156 265 781 250 000 000 000 000 + 0;
  • 15 625 156 265 781 250 000 000 000 000 ÷ 2 = 7 812 578 132 890 625 000 000 000 000 + 0;
  • 7 812 578 132 890 625 000 000 000 000 ÷ 2 = 3 906 289 066 445 312 500 000 000 000 + 0;
  • 3 906 289 066 445 312 500 000 000 000 ÷ 2 = 1 953 144 533 222 656 250 000 000 000 + 0;
  • 1 953 144 533 222 656 250 000 000 000 ÷ 2 = 976 572 266 611 328 125 000 000 000 + 0;
  • 976 572 266 611 328 125 000 000 000 ÷ 2 = 488 286 133 305 664 062 500 000 000 + 0;
  • 488 286 133 305 664 062 500 000 000 ÷ 2 = 244 143 066 652 832 031 250 000 000 + 0;
  • 244 143 066 652 832 031 250 000 000 ÷ 2 = 122 071 533 326 416 015 625 000 000 + 0;
  • 122 071 533 326 416 015 625 000 000 ÷ 2 = 61 035 766 663 208 007 812 500 000 + 0;
  • 61 035 766 663 208 007 812 500 000 ÷ 2 = 30 517 883 331 604 003 906 250 000 + 0;
  • 30 517 883 331 604 003 906 250 000 ÷ 2 = 15 258 941 665 802 001 953 125 000 + 0;
  • 15 258 941 665 802 001 953 125 000 ÷ 2 = 7 629 470 832 901 000 976 562 500 + 0;
  • 7 629 470 832 901 000 976 562 500 ÷ 2 = 3 814 735 416 450 500 488 281 250 + 0;
  • 3 814 735 416 450 500 488 281 250 ÷ 2 = 1 907 367 708 225 250 244 140 625 + 0;
  • 1 907 367 708 225 250 244 140 625 ÷ 2 = 953 683 854 112 625 122 070 312 + 1;
  • 953 683 854 112 625 122 070 312 ÷ 2 = 476 841 927 056 312 561 035 156 + 0;
  • 476 841 927 056 312 561 035 156 ÷ 2 = 238 420 963 528 156 280 517 578 + 0;
  • 238 420 963 528 156 280 517 578 ÷ 2 = 119 210 481 764 078 140 258 789 + 0;
  • 119 210 481 764 078 140 258 789 ÷ 2 = 59 605 240 882 039 070 129 394 + 1;
  • 59 605 240 882 039 070 129 394 ÷ 2 = 29 802 620 441 019 535 064 697 + 0;
  • 29 802 620 441 019 535 064 697 ÷ 2 = 14 901 310 220 509 767 532 348 + 1;
  • 14 901 310 220 509 767 532 348 ÷ 2 = 7 450 655 110 254 883 766 174 + 0;
  • 7 450 655 110 254 883 766 174 ÷ 2 = 3 725 327 555 127 441 883 087 + 0;
  • 3 725 327 555 127 441 883 087 ÷ 2 = 1 862 663 777 563 720 941 543 + 1;
  • 1 862 663 777 563 720 941 543 ÷ 2 = 931 331 888 781 860 470 771 + 1;
  • 931 331 888 781 860 470 771 ÷ 2 = 465 665 944 390 930 235 385 + 1;
  • 465 665 944 390 930 235 385 ÷ 2 = 232 832 972 195 465 117 692 + 1;
  • 232 832 972 195 465 117 692 ÷ 2 = 116 416 486 097 732 558 846 + 0;
  • 116 416 486 097 732 558 846 ÷ 2 = 58 208 243 048 866 279 423 + 0;
  • 58 208 243 048 866 279 423 ÷ 2 = 29 104 121 524 433 139 711 + 1;
  • 29 104 121 524 433 139 711 ÷ 2 = 14 552 060 762 216 569 855 + 1;
  • 14 552 060 762 216 569 855 ÷ 2 = 7 276 030 381 108 284 927 + 1;
  • 7 276 030 381 108 284 927 ÷ 2 = 3 638 015 190 554 142 463 + 1;
  • 3 638 015 190 554 142 463 ÷ 2 = 1 819 007 595 277 071 231 + 1;
  • 1 819 007 595 277 071 231 ÷ 2 = 909 503 797 638 535 615 + 1;
  • 909 503 797 638 535 615 ÷ 2 = 454 751 898 819 267 807 + 1;
  • 454 751 898 819 267 807 ÷ 2 = 227 375 949 409 633 903 + 1;
  • 227 375 949 409 633 903 ÷ 2 = 113 687 974 704 816 951 + 1;
  • 113 687 974 704 816 951 ÷ 2 = 56 843 987 352 408 475 + 1;
  • 56 843 987 352 408 475 ÷ 2 = 28 421 993 676 204 237 + 1;
  • 28 421 993 676 204 237 ÷ 2 = 14 210 996 838 102 118 + 1;
  • 14 210 996 838 102 118 ÷ 2 = 7 105 498 419 051 059 + 0;
  • 7 105 498 419 051 059 ÷ 2 = 3 552 749 209 525 529 + 1;
  • 3 552 749 209 525 529 ÷ 2 = 1 776 374 604 762 764 + 1;
  • 1 776 374 604 762 764 ÷ 2 = 888 187 302 381 382 + 0;
  • 888 187 302 381 382 ÷ 2 = 444 093 651 190 691 + 0;
  • 444 093 651 190 691 ÷ 2 = 222 046 825 595 345 + 1;
  • 222 046 825 595 345 ÷ 2 = 111 023 412 797 672 + 1;
  • 111 023 412 797 672 ÷ 2 = 55 511 706 398 836 + 0;
  • 55 511 706 398 836 ÷ 2 = 27 755 853 199 418 + 0;
  • 27 755 853 199 418 ÷ 2 = 13 877 926 599 709 + 0;
  • 13 877 926 599 709 ÷ 2 = 6 938 963 299 854 + 1;
  • 6 938 963 299 854 ÷ 2 = 3 469 481 649 927 + 0;
  • 3 469 481 649 927 ÷ 2 = 1 734 740 824 963 + 1;
  • 1 734 740 824 963 ÷ 2 = 867 370 412 481 + 1;
  • 867 370 412 481 ÷ 2 = 433 685 206 240 + 1;
  • 433 685 206 240 ÷ 2 = 216 842 603 120 + 0;
  • 216 842 603 120 ÷ 2 = 108 421 301 560 + 0;
  • 108 421 301 560 ÷ 2 = 54 210 650 780 + 0;
  • 54 210 650 780 ÷ 2 = 27 105 325 390 + 0;
  • 27 105 325 390 ÷ 2 = 13 552 662 695 + 0;
  • 13 552 662 695 ÷ 2 = 6 776 331 347 + 1;
  • 6 776 331 347 ÷ 2 = 3 388 165 673 + 1;
  • 3 388 165 673 ÷ 2 = 1 694 082 836 + 1;
  • 1 694 082 836 ÷ 2 = 847 041 418 + 0;
  • 847 041 418 ÷ 2 = 423 520 709 + 0;
  • 423 520 709 ÷ 2 = 211 760 354 + 1;
  • 211 760 354 ÷ 2 = 105 880 177 + 0;
  • 105 880 177 ÷ 2 = 52 940 088 + 1;
  • 52 940 088 ÷ 2 = 26 470 044 + 0;
  • 26 470 044 ÷ 2 = 13 235 022 + 0;
  • 13 235 022 ÷ 2 = 6 617 511 + 0;
  • 6 617 511 ÷ 2 = 3 308 755 + 1;
  • 3 308 755 ÷ 2 = 1 654 377 + 1;
  • 1 654 377 ÷ 2 = 827 188 + 1;
  • 827 188 ÷ 2 = 413 594 + 0;
  • 413 594 ÷ 2 = 206 797 + 0;
  • 206 797 ÷ 2 = 103 398 + 1;
  • 103 398 ÷ 2 = 51 699 + 0;
  • 51 699 ÷ 2 = 25 849 + 1;
  • 25 849 ÷ 2 = 12 924 + 1;
  • 12 924 ÷ 2 = 6 462 + 0;
  • 6 462 ÷ 2 = 3 231 + 0;
  • 3 231 ÷ 2 = 1 615 + 1;
  • 1 615 ÷ 2 = 807 + 1;
  • 807 ÷ 2 = 403 + 1;
  • 403 ÷ 2 = 201 + 1;
  • 201 ÷ 2 = 100 + 1;
  • 100 ÷ 2 = 50 + 0;
  • 50 ÷ 2 = 25 + 0;
  • 25 ÷ 2 = 12 + 1;
  • 12 ÷ 2 = 6 + 0;
  • 6 ÷ 2 = 3 + 0;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

2. Construct the base 2 representation of the integer part of the number, by taking all the remainders starting from the bottom of the list constructed above:

1 000 010 001 010 000 000 000 000 000 000(10) =


1100 1001 1111 0011 0100 1110 0010 1001 1100 0001 1101 0001 1001 1011 1111 1111 1100 1111 0010 1000 1000 0000 0000 0000 0000(2)

3. Normalize the binary representation of the number, shifting the decimal mark 99 positions to the left so that only one non zero digit remains to the left of it:

1 000 010 001 010 000 000 000 000 000 000(10) =


1100 1001 1111 0011 0100 1110 0010 1001 1100 0001 1101 0001 1001 1011 1111 1111 1100 1111 0010 1000 1000 0000 0000 0000 0000(2) =


1100 1001 1111 0011 0100 1110 0010 1001 1100 0001 1101 0001 1001 1011 1111 1111 1100 1111 0010 1000 1000 0000 0000 0000 0000(2) × 20 =


1.1001 0011 1110 0110 1001 1100 0101 0011 1000 0011 1010 0011 0011 0111 1111 1111 1001 1110 0101 0001 0000 0000 0000 0000 000(2) × 299

Up to this moment, there are the following elements that would feed into the 32 bit single precision IEEE 754 binary floating point representation:

Sign: 0 (a positive number)


Exponent (unadjusted): 99


Mantissa (not normalized): 1.1001 0011 1110 0110 1001 1100 0101 0011 1000 0011 1010 0011 0011 0111 1111 1111 1001 1110 0101 0001 0000 0000 0000 0000 000

4. Adjust the exponent in 8 bit excess/bias notation and then convert it from decimal (base 10) to 8 bit binary, by using the same technique of repeatedly dividing by 2:

Exponent (adjusted) =


Exponent (unadjusted) + 2(8-1) - 1 =


99 + 2(8-1) - 1 =


(99 + 127)(10) =


226(10)


  • division = quotient + remainder;
  • 226 ÷ 2 = 113 + 0;
  • 113 ÷ 2 = 56 + 1;
  • 56 ÷ 2 = 28 + 0;
  • 28 ÷ 2 = 14 + 0;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

Exponent (adjusted) =


226(10) =


1110 0010(2)

5. Normalize mantissa, remove the leading (the leftmost) bit, since it's allways 1 (and the decimal point, if the case) then adjust its length to 23 bits, by removing the excess bits, from the right (if any of the excess bits is set on 1, we are losing precision...):

Mantissa (normalized) =


1. 100 1001 1111 0011 0100 1110 0010 1001 1100 0001 1101 0001 1001 1011 1111 1111 1100 1111 0010 1000 1000 0000 0000 0000 0000 =


100 1001 1111 0011 0100 1110

Conclusion:

The three elements that make up the number's 32 bit single precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (8 bits) =
1110 0010


Mantissa (23 bits) =
100 1001 1111 0011 0100 1110

Number 1 000 010 001 010 000 000 000 000 000 000, a decimal, converted from decimal system (base 10)
to
32 bit single precision IEEE 754 binary floating point:


0 - 1110 0010 - 100 1001 1111 0011 0100 1110

(32 bits IEEE 754)
  • Sign (1 bit):

    • 0

      31
  • Exponent (8 bits):

    • 1

      30
    • 1

      29
    • 1

      28
    • 0

      27
    • 0

      26
    • 0

      25
    • 1

      24
    • 0

      23
  • Mantissa (23 bits):

    • 1

      22
    • 0

      21
    • 0

      20
    • 1

      19
    • 0

      18
    • 0

      17
    • 1

      16
    • 1

      15
    • 1

      14
    • 1

      13
    • 1

      12
    • 0

      11
    • 0

      10
    • 1

      9
    • 1

      8
    • 0

      7
    • 1

      6
    • 0

      5
    • 0

      4
    • 1

      3
    • 1

      2
    • 1

      1
    • 0

      0

Convert decimal numbers from base ten to 32 bit single precision IEEE 754 binary floating point standard

A number in 32 bit single precision IEEE 754 binary floating point standard representation requires three building elements: sign (it takes 1 bit and it's either 0 for positive or 1 for negative numbers), exponent (8 bits), mantissa (23 bits)

Latest decimal numbers converted from base ten to 32 bit single precision IEEE 754 floating point binary standard representation

How to convert decimal numbers from base ten to 32 bit single precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 32 bit single precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the base ten positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, by shifting the decimal point (or if you prefer, the decimal mark) "n" positions either to the left or to the right, so that only one non zero digit remains to the left of the decimal point.
  • 7. Adjust the exponent in 8 bit excess/bias notation and then convert it from decimal (base 10) to 8 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(8-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign if the case) and adjust its length to 23 bits, either by removing the excess bits from the right (losing precision...) or by adding extra '0' bits to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -25.347 from decimal system (base ten) to 32 bit single precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-25.347| = 25.347

  • 2. First convert the integer part, 25. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 25 ÷ 2 = 12 + 1;
    • 12 ÷ 2 = 6 + 0;
    • 6 ÷ 2 = 3 + 0;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    25(10) = 1 1001(2)

  • 4. Then convert the fractional part, 0.347. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.347 × 2 = 0 + 0.694;
    • 2) 0.694 × 2 = 1 + 0.388;
    • 3) 0.388 × 2 = 0 + 0.776;
    • 4) 0.776 × 2 = 1 + 0.552;
    • 5) 0.552 × 2 = 1 + 0.104;
    • 6) 0.104 × 2 = 0 + 0.208;
    • 7) 0.208 × 2 = 0 + 0.416;
    • 8) 0.416 × 2 = 0 + 0.832;
    • 9) 0.832 × 2 = 1 + 0.664;
    • 10) 0.664 × 2 = 1 + 0.328;
    • 11) 0.328 × 2 = 0 + 0.656;
    • 12) 0.656 × 2 = 1 + 0.312;
    • 13) 0.312 × 2 = 0 + 0.624;
    • 14) 0.624 × 2 = 1 + 0.248;
    • 15) 0.248 × 2 = 0 + 0.496;
    • 16) 0.496 × 2 = 0 + 0.992;
    • 17) 0.992 × 2 = 1 + 0.984;
    • 18) 0.984 × 2 = 1 + 0.968;
    • 19) 0.968 × 2 = 1 + 0.936;
    • 20) 0.936 × 2 = 1 + 0.872;
    • 21) 0.872 × 2 = 1 + 0.744;
    • 22) 0.744 × 2 = 1 + 0.488;
    • 23) 0.488 × 2 = 0 + 0.976;
    • 24) 0.976 × 2 = 1 + 0.952;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 23) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.347(10) = 0.0101 1000 1101 0100 1111 1101(2)

  • 6. Summarizing - the positive number before normalization:

    25.347(10) = 1 1001.0101 1000 1101 0100 1111 1101(2)

  • 7. Normalize the binary representation of the number, shifting the decimal point 4 positions to the left so that only one non-zero digit stays to the left of the decimal point:

    25.347(10) =
    1 1001.0101 1000 1101 0100 1111 1101(2) =
    1 1001.0101 1000 1101 0100 1111 1101(2) × 20 =
    1.1001 0101 1000 1101 0100 1111 1101(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 32 bit single precision IEEE 754 binary floating point:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1001 0101 1000 1101 0100 1111 1101

  • 9. Adjust the exponent in 8 bit excess/bias notation and then convert it from decimal (base 10) to 8 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as already demonstrated above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(8-1) - 1 = (4 + 127)(10) = 131(10) =
    1000 0011(2)

  • 10. Normalize the mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal point) and adjust its length to 23 bits, by removing the excess bits from the right (losing precision...):

    Mantissa (not-normalized): 1.1001 0101 1000 1101 0100 1111 1101

    Mantissa (normalized): 100 1010 1100 0110 1010 0111

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 1000 0011

    Mantissa (23 bits) = 100 1010 1100 0110 1010 0111

  • Number -25.347, converted from the decimal system (base 10) to 32 bit single precision IEEE 754 binary floating point =


    1 - 1000 0011 - 100 1010 1100 0110 1010 0111