1 000 001 100 101 100 000 000 000 000 399 Converted to 32 Bit Single Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 1 000 001 100 101 100 000 000 000 000 399(10) to 32 bit single precision IEEE 754 binary floating point representation standard (1 bit for sign, 8 bits for exponent, 23 bits for mantissa)

What are the steps to convert decimal number
1 000 001 100 101 100 000 000 000 000 399(10) to 32 bit single precision IEEE 754 binary floating point representation (1 bit for sign, 8 bits for exponent, 23 bits for mantissa)

1. Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 1 000 001 100 101 100 000 000 000 000 399 ÷ 2 = 500 000 550 050 550 000 000 000 000 199 + 1;
  • 500 000 550 050 550 000 000 000 000 199 ÷ 2 = 250 000 275 025 275 000 000 000 000 099 + 1;
  • 250 000 275 025 275 000 000 000 000 099 ÷ 2 = 125 000 137 512 637 500 000 000 000 049 + 1;
  • 125 000 137 512 637 500 000 000 000 049 ÷ 2 = 62 500 068 756 318 750 000 000 000 024 + 1;
  • 62 500 068 756 318 750 000 000 000 024 ÷ 2 = 31 250 034 378 159 375 000 000 000 012 + 0;
  • 31 250 034 378 159 375 000 000 000 012 ÷ 2 = 15 625 017 189 079 687 500 000 000 006 + 0;
  • 15 625 017 189 079 687 500 000 000 006 ÷ 2 = 7 812 508 594 539 843 750 000 000 003 + 0;
  • 7 812 508 594 539 843 750 000 000 003 ÷ 2 = 3 906 254 297 269 921 875 000 000 001 + 1;
  • 3 906 254 297 269 921 875 000 000 001 ÷ 2 = 1 953 127 148 634 960 937 500 000 000 + 1;
  • 1 953 127 148 634 960 937 500 000 000 ÷ 2 = 976 563 574 317 480 468 750 000 000 + 0;
  • 976 563 574 317 480 468 750 000 000 ÷ 2 = 488 281 787 158 740 234 375 000 000 + 0;
  • 488 281 787 158 740 234 375 000 000 ÷ 2 = 244 140 893 579 370 117 187 500 000 + 0;
  • 244 140 893 579 370 117 187 500 000 ÷ 2 = 122 070 446 789 685 058 593 750 000 + 0;
  • 122 070 446 789 685 058 593 750 000 ÷ 2 = 61 035 223 394 842 529 296 875 000 + 0;
  • 61 035 223 394 842 529 296 875 000 ÷ 2 = 30 517 611 697 421 264 648 437 500 + 0;
  • 30 517 611 697 421 264 648 437 500 ÷ 2 = 15 258 805 848 710 632 324 218 750 + 0;
  • 15 258 805 848 710 632 324 218 750 ÷ 2 = 7 629 402 924 355 316 162 109 375 + 0;
  • 7 629 402 924 355 316 162 109 375 ÷ 2 = 3 814 701 462 177 658 081 054 687 + 1;
  • 3 814 701 462 177 658 081 054 687 ÷ 2 = 1 907 350 731 088 829 040 527 343 + 1;
  • 1 907 350 731 088 829 040 527 343 ÷ 2 = 953 675 365 544 414 520 263 671 + 1;
  • 953 675 365 544 414 520 263 671 ÷ 2 = 476 837 682 772 207 260 131 835 + 1;
  • 476 837 682 772 207 260 131 835 ÷ 2 = 238 418 841 386 103 630 065 917 + 1;
  • 238 418 841 386 103 630 065 917 ÷ 2 = 119 209 420 693 051 815 032 958 + 1;
  • 119 209 420 693 051 815 032 958 ÷ 2 = 59 604 710 346 525 907 516 479 + 0;
  • 59 604 710 346 525 907 516 479 ÷ 2 = 29 802 355 173 262 953 758 239 + 1;
  • 29 802 355 173 262 953 758 239 ÷ 2 = 14 901 177 586 631 476 879 119 + 1;
  • 14 901 177 586 631 476 879 119 ÷ 2 = 7 450 588 793 315 738 439 559 + 1;
  • 7 450 588 793 315 738 439 559 ÷ 2 = 3 725 294 396 657 869 219 779 + 1;
  • 3 725 294 396 657 869 219 779 ÷ 2 = 1 862 647 198 328 934 609 889 + 1;
  • 1 862 647 198 328 934 609 889 ÷ 2 = 931 323 599 164 467 304 944 + 1;
  • 931 323 599 164 467 304 944 ÷ 2 = 465 661 799 582 233 652 472 + 0;
  • 465 661 799 582 233 652 472 ÷ 2 = 232 830 899 791 116 826 236 + 0;
  • 232 830 899 791 116 826 236 ÷ 2 = 116 415 449 895 558 413 118 + 0;
  • 116 415 449 895 558 413 118 ÷ 2 = 58 207 724 947 779 206 559 + 0;
  • 58 207 724 947 779 206 559 ÷ 2 = 29 103 862 473 889 603 279 + 1;
  • 29 103 862 473 889 603 279 ÷ 2 = 14 551 931 236 944 801 639 + 1;
  • 14 551 931 236 944 801 639 ÷ 2 = 7 275 965 618 472 400 819 + 1;
  • 7 275 965 618 472 400 819 ÷ 2 = 3 637 982 809 236 200 409 + 1;
  • 3 637 982 809 236 200 409 ÷ 2 = 1 818 991 404 618 100 204 + 1;
  • 1 818 991 404 618 100 204 ÷ 2 = 909 495 702 309 050 102 + 0;
  • 909 495 702 309 050 102 ÷ 2 = 454 747 851 154 525 051 + 0;
  • 454 747 851 154 525 051 ÷ 2 = 227 373 925 577 262 525 + 1;
  • 227 373 925 577 262 525 ÷ 2 = 113 686 962 788 631 262 + 1;
  • 113 686 962 788 631 262 ÷ 2 = 56 843 481 394 315 631 + 0;
  • 56 843 481 394 315 631 ÷ 2 = 28 421 740 697 157 815 + 1;
  • 28 421 740 697 157 815 ÷ 2 = 14 210 870 348 578 907 + 1;
  • 14 210 870 348 578 907 ÷ 2 = 7 105 435 174 289 453 + 1;
  • 7 105 435 174 289 453 ÷ 2 = 3 552 717 587 144 726 + 1;
  • 3 552 717 587 144 726 ÷ 2 = 1 776 358 793 572 363 + 0;
  • 1 776 358 793 572 363 ÷ 2 = 888 179 396 786 181 + 1;
  • 888 179 396 786 181 ÷ 2 = 444 089 698 393 090 + 1;
  • 444 089 698 393 090 ÷ 2 = 222 044 849 196 545 + 0;
  • 222 044 849 196 545 ÷ 2 = 111 022 424 598 272 + 1;
  • 111 022 424 598 272 ÷ 2 = 55 511 212 299 136 + 0;
  • 55 511 212 299 136 ÷ 2 = 27 755 606 149 568 + 0;
  • 27 755 606 149 568 ÷ 2 = 13 877 803 074 784 + 0;
  • 13 877 803 074 784 ÷ 2 = 6 938 901 537 392 + 0;
  • 6 938 901 537 392 ÷ 2 = 3 469 450 768 696 + 0;
  • 3 469 450 768 696 ÷ 2 = 1 734 725 384 348 + 0;
  • 1 734 725 384 348 ÷ 2 = 867 362 692 174 + 0;
  • 867 362 692 174 ÷ 2 = 433 681 346 087 + 0;
  • 433 681 346 087 ÷ 2 = 216 840 673 043 + 1;
  • 216 840 673 043 ÷ 2 = 108 420 336 521 + 1;
  • 108 420 336 521 ÷ 2 = 54 210 168 260 + 1;
  • 54 210 168 260 ÷ 2 = 27 105 084 130 + 0;
  • 27 105 084 130 ÷ 2 = 13 552 542 065 + 0;
  • 13 552 542 065 ÷ 2 = 6 776 271 032 + 1;
  • 6 776 271 032 ÷ 2 = 3 388 135 516 + 0;
  • 3 388 135 516 ÷ 2 = 1 694 067 758 + 0;
  • 1 694 067 758 ÷ 2 = 847 033 879 + 0;
  • 847 033 879 ÷ 2 = 423 516 939 + 1;
  • 423 516 939 ÷ 2 = 211 758 469 + 1;
  • 211 758 469 ÷ 2 = 105 879 234 + 1;
  • 105 879 234 ÷ 2 = 52 939 617 + 0;
  • 52 939 617 ÷ 2 = 26 469 808 + 1;
  • 26 469 808 ÷ 2 = 13 234 904 + 0;
  • 13 234 904 ÷ 2 = 6 617 452 + 0;
  • 6 617 452 ÷ 2 = 3 308 726 + 0;
  • 3 308 726 ÷ 2 = 1 654 363 + 0;
  • 1 654 363 ÷ 2 = 827 181 + 1;
  • 827 181 ÷ 2 = 413 590 + 1;
  • 413 590 ÷ 2 = 206 795 + 0;
  • 206 795 ÷ 2 = 103 397 + 1;
  • 103 397 ÷ 2 = 51 698 + 1;
  • 51 698 ÷ 2 = 25 849 + 0;
  • 25 849 ÷ 2 = 12 924 + 1;
  • 12 924 ÷ 2 = 6 462 + 0;
  • 6 462 ÷ 2 = 3 231 + 0;
  • 3 231 ÷ 2 = 1 615 + 1;
  • 1 615 ÷ 2 = 807 + 1;
  • 807 ÷ 2 = 403 + 1;
  • 403 ÷ 2 = 201 + 1;
  • 201 ÷ 2 = 100 + 1;
  • 100 ÷ 2 = 50 + 0;
  • 50 ÷ 2 = 25 + 0;
  • 25 ÷ 2 = 12 + 1;
  • 12 ÷ 2 = 6 + 0;
  • 6 ÷ 2 = 3 + 0;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

2. Construct the base 2 representation of the positive number.

Take all the remainders starting from the bottom of the list constructed above.

1 000 001 100 101 100 000 000 000 000 399(10) =


1100 1001 1111 0010 1101 1000 0101 1100 0100 1110 0000 0001 0110 1111 0110 0111 1100 0011 1111 0111 1110 0000 0001 1000 1111(2)


3. Normalize the binary representation of the number.

Shift the decimal mark 99 positions to the left, so that only one non zero digit remains to the left of it:


1 000 001 100 101 100 000 000 000 000 399(10) =


1100 1001 1111 0010 1101 1000 0101 1100 0100 1110 0000 0001 0110 1111 0110 0111 1100 0011 1111 0111 1110 0000 0001 1000 1111(2) =


1100 1001 1111 0010 1101 1000 0101 1100 0100 1110 0000 0001 0110 1111 0110 0111 1100 0011 1111 0111 1110 0000 0001 1000 1111(2) × 20 =


1.1001 0011 1110 0101 1011 0000 1011 1000 1001 1100 0000 0010 1101 1110 1100 1111 1000 0111 1110 1111 1100 0000 0011 0001 111(2) × 299


4. Up to this moment, there are the following elements that would feed into the 32 bit single precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): 99


Mantissa (not normalized):
1.1001 0011 1110 0101 1011 0000 1011 1000 1001 1100 0000 0010 1101 1110 1100 1111 1000 0111 1110 1111 1100 0000 0011 0001 111


5. Adjust the exponent.

Use the 8 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(8-1) - 1 =


99 + 2(8-1) - 1 =


(99 + 127)(10) =


226(10)


6. Convert the adjusted exponent from the decimal (base 10) to 8 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 226 ÷ 2 = 113 + 0;
  • 113 ÷ 2 = 56 + 1;
  • 56 ÷ 2 = 28 + 0;
  • 28 ÷ 2 = 14 + 0;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

7. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


226(10) =


1110 0010(2)


8. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 23 bits, by removing the excess bits, from the right (if any of the excess bits is set on 1, we are losing precision...).


Mantissa (normalized) =


1. 100 1001 1111 0010 1101 1000 0101 1100 0100 1110 0000 0001 0110 1111 0110 0111 1100 0011 1111 0111 1110 0000 0001 1000 1111 =


100 1001 1111 0010 1101 1000


9. The three elements that make up the number's 32 bit single precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (8 bits) =
1110 0010


Mantissa (23 bits) =
100 1001 1111 0010 1101 1000


Decimal number 1 000 001 100 101 100 000 000 000 000 399 converted to 32 bit single precision IEEE 754 binary floating point representation:

0 - 1110 0010 - 100 1001 1111 0010 1101 1000


How to convert decimal numbers from base ten to 32 bit single precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 32 bit single precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the base ten positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, by shifting the decimal point (or if you prefer, the decimal mark) "n" positions either to the left or to the right, so that only one non zero digit remains to the left of the decimal point.
  • 7. Adjust the exponent in 8 bit excess/bias notation and then convert it from decimal (base 10) to 8 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(8-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign if the case) and adjust its length to 23 bits, either by removing the excess bits from the right (losing precision...) or by adding extra '0' bits to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -25.347 from decimal system (base ten) to 32 bit single precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-25.347| = 25.347

  • 2. First convert the integer part, 25. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 25 ÷ 2 = 12 + 1;
    • 12 ÷ 2 = 6 + 0;
    • 6 ÷ 2 = 3 + 0;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    25(10) = 1 1001(2)

  • 4. Then convert the fractional part, 0.347. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.347 × 2 = 0 + 0.694;
    • 2) 0.694 × 2 = 1 + 0.388;
    • 3) 0.388 × 2 = 0 + 0.776;
    • 4) 0.776 × 2 = 1 + 0.552;
    • 5) 0.552 × 2 = 1 + 0.104;
    • 6) 0.104 × 2 = 0 + 0.208;
    • 7) 0.208 × 2 = 0 + 0.416;
    • 8) 0.416 × 2 = 0 + 0.832;
    • 9) 0.832 × 2 = 1 + 0.664;
    • 10) 0.664 × 2 = 1 + 0.328;
    • 11) 0.328 × 2 = 0 + 0.656;
    • 12) 0.656 × 2 = 1 + 0.312;
    • 13) 0.312 × 2 = 0 + 0.624;
    • 14) 0.624 × 2 = 1 + 0.248;
    • 15) 0.248 × 2 = 0 + 0.496;
    • 16) 0.496 × 2 = 0 + 0.992;
    • 17) 0.992 × 2 = 1 + 0.984;
    • 18) 0.984 × 2 = 1 + 0.968;
    • 19) 0.968 × 2 = 1 + 0.936;
    • 20) 0.936 × 2 = 1 + 0.872;
    • 21) 0.872 × 2 = 1 + 0.744;
    • 22) 0.744 × 2 = 1 + 0.488;
    • 23) 0.488 × 2 = 0 + 0.976;
    • 24) 0.976 × 2 = 1 + 0.952;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 23) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.347(10) = 0.0101 1000 1101 0100 1111 1101(2)

  • 6. Summarizing - the positive number before normalization:

    25.347(10) = 1 1001.0101 1000 1101 0100 1111 1101(2)

  • 7. Normalize the binary representation of the number, shifting the decimal point 4 positions to the left so that only one non-zero digit stays to the left of the decimal point:

    25.347(10) =
    1 1001.0101 1000 1101 0100 1111 1101(2) =
    1 1001.0101 1000 1101 0100 1111 1101(2) × 20 =
    1.1001 0101 1000 1101 0100 1111 1101(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 32 bit single precision IEEE 754 binary floating point:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1001 0101 1000 1101 0100 1111 1101

  • 9. Adjust the exponent in 8 bit excess/bias notation and then convert it from decimal (base 10) to 8 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as already demonstrated above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(8-1) - 1 = (4 + 127)(10) = 131(10) =
    1000 0011(2)

  • 10. Normalize the mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal point) and adjust its length to 23 bits, by removing the excess bits from the right (losing precision...):

    Mantissa (not-normalized): 1.1001 0101 1000 1101 0100 1111 1101

    Mantissa (normalized): 100 1010 1100 0110 1010 0111

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 1000 0011

    Mantissa (23 bits) = 100 1010 1100 0110 1010 0111

  • Number -25.347, converted from the decimal system (base 10) to 32 bit single precision IEEE 754 binary floating point =
    1 - 1000 0011 - 100 1010 1100 0110 1010 0111