-0.000 000 000 000 000 000 174 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal -0.000 000 000 000 000 000 174(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
-0.000 000 000 000 000 000 174(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. Start with the positive version of the number:

|-0.000 000 000 000 000 000 174| = 0.000 000 000 000 000 000 174


2. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

3. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


4. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 174.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 174 × 2 = 0 + 0.000 000 000 000 000 000 348;
  • 2) 0.000 000 000 000 000 000 348 × 2 = 0 + 0.000 000 000 000 000 000 696;
  • 3) 0.000 000 000 000 000 000 696 × 2 = 0 + 0.000 000 000 000 000 001 392;
  • 4) 0.000 000 000 000 000 001 392 × 2 = 0 + 0.000 000 000 000 000 002 784;
  • 5) 0.000 000 000 000 000 002 784 × 2 = 0 + 0.000 000 000 000 000 005 568;
  • 6) 0.000 000 000 000 000 005 568 × 2 = 0 + 0.000 000 000 000 000 011 136;
  • 7) 0.000 000 000 000 000 011 136 × 2 = 0 + 0.000 000 000 000 000 022 272;
  • 8) 0.000 000 000 000 000 022 272 × 2 = 0 + 0.000 000 000 000 000 044 544;
  • 9) 0.000 000 000 000 000 044 544 × 2 = 0 + 0.000 000 000 000 000 089 088;
  • 10) 0.000 000 000 000 000 089 088 × 2 = 0 + 0.000 000 000 000 000 178 176;
  • 11) 0.000 000 000 000 000 178 176 × 2 = 0 + 0.000 000 000 000 000 356 352;
  • 12) 0.000 000 000 000 000 356 352 × 2 = 0 + 0.000 000 000 000 000 712 704;
  • 13) 0.000 000 000 000 000 712 704 × 2 = 0 + 0.000 000 000 000 001 425 408;
  • 14) 0.000 000 000 000 001 425 408 × 2 = 0 + 0.000 000 000 000 002 850 816;
  • 15) 0.000 000 000 000 002 850 816 × 2 = 0 + 0.000 000 000 000 005 701 632;
  • 16) 0.000 000 000 000 005 701 632 × 2 = 0 + 0.000 000 000 000 011 403 264;
  • 17) 0.000 000 000 000 011 403 264 × 2 = 0 + 0.000 000 000 000 022 806 528;
  • 18) 0.000 000 000 000 022 806 528 × 2 = 0 + 0.000 000 000 000 045 613 056;
  • 19) 0.000 000 000 000 045 613 056 × 2 = 0 + 0.000 000 000 000 091 226 112;
  • 20) 0.000 000 000 000 091 226 112 × 2 = 0 + 0.000 000 000 000 182 452 224;
  • 21) 0.000 000 000 000 182 452 224 × 2 = 0 + 0.000 000 000 000 364 904 448;
  • 22) 0.000 000 000 000 364 904 448 × 2 = 0 + 0.000 000 000 000 729 808 896;
  • 23) 0.000 000 000 000 729 808 896 × 2 = 0 + 0.000 000 000 001 459 617 792;
  • 24) 0.000 000 000 001 459 617 792 × 2 = 0 + 0.000 000 000 002 919 235 584;
  • 25) 0.000 000 000 002 919 235 584 × 2 = 0 + 0.000 000 000 005 838 471 168;
  • 26) 0.000 000 000 005 838 471 168 × 2 = 0 + 0.000 000 000 011 676 942 336;
  • 27) 0.000 000 000 011 676 942 336 × 2 = 0 + 0.000 000 000 023 353 884 672;
  • 28) 0.000 000 000 023 353 884 672 × 2 = 0 + 0.000 000 000 046 707 769 344;
  • 29) 0.000 000 000 046 707 769 344 × 2 = 0 + 0.000 000 000 093 415 538 688;
  • 30) 0.000 000 000 093 415 538 688 × 2 = 0 + 0.000 000 000 186 831 077 376;
  • 31) 0.000 000 000 186 831 077 376 × 2 = 0 + 0.000 000 000 373 662 154 752;
  • 32) 0.000 000 000 373 662 154 752 × 2 = 0 + 0.000 000 000 747 324 309 504;
  • 33) 0.000 000 000 747 324 309 504 × 2 = 0 + 0.000 000 001 494 648 619 008;
  • 34) 0.000 000 001 494 648 619 008 × 2 = 0 + 0.000 000 002 989 297 238 016;
  • 35) 0.000 000 002 989 297 238 016 × 2 = 0 + 0.000 000 005 978 594 476 032;
  • 36) 0.000 000 005 978 594 476 032 × 2 = 0 + 0.000 000 011 957 188 952 064;
  • 37) 0.000 000 011 957 188 952 064 × 2 = 0 + 0.000 000 023 914 377 904 128;
  • 38) 0.000 000 023 914 377 904 128 × 2 = 0 + 0.000 000 047 828 755 808 256;
  • 39) 0.000 000 047 828 755 808 256 × 2 = 0 + 0.000 000 095 657 511 616 512;
  • 40) 0.000 000 095 657 511 616 512 × 2 = 0 + 0.000 000 191 315 023 233 024;
  • 41) 0.000 000 191 315 023 233 024 × 2 = 0 + 0.000 000 382 630 046 466 048;
  • 42) 0.000 000 382 630 046 466 048 × 2 = 0 + 0.000 000 765 260 092 932 096;
  • 43) 0.000 000 765 260 092 932 096 × 2 = 0 + 0.000 001 530 520 185 864 192;
  • 44) 0.000 001 530 520 185 864 192 × 2 = 0 + 0.000 003 061 040 371 728 384;
  • 45) 0.000 003 061 040 371 728 384 × 2 = 0 + 0.000 006 122 080 743 456 768;
  • 46) 0.000 006 122 080 743 456 768 × 2 = 0 + 0.000 012 244 161 486 913 536;
  • 47) 0.000 012 244 161 486 913 536 × 2 = 0 + 0.000 024 488 322 973 827 072;
  • 48) 0.000 024 488 322 973 827 072 × 2 = 0 + 0.000 048 976 645 947 654 144;
  • 49) 0.000 048 976 645 947 654 144 × 2 = 0 + 0.000 097 953 291 895 308 288;
  • 50) 0.000 097 953 291 895 308 288 × 2 = 0 + 0.000 195 906 583 790 616 576;
  • 51) 0.000 195 906 583 790 616 576 × 2 = 0 + 0.000 391 813 167 581 233 152;
  • 52) 0.000 391 813 167 581 233 152 × 2 = 0 + 0.000 783 626 335 162 466 304;
  • 53) 0.000 783 626 335 162 466 304 × 2 = 0 + 0.001 567 252 670 324 932 608;
  • 54) 0.001 567 252 670 324 932 608 × 2 = 0 + 0.003 134 505 340 649 865 216;
  • 55) 0.003 134 505 340 649 865 216 × 2 = 0 + 0.006 269 010 681 299 730 432;
  • 56) 0.006 269 010 681 299 730 432 × 2 = 0 + 0.012 538 021 362 599 460 864;
  • 57) 0.012 538 021 362 599 460 864 × 2 = 0 + 0.025 076 042 725 198 921 728;
  • 58) 0.025 076 042 725 198 921 728 × 2 = 0 + 0.050 152 085 450 397 843 456;
  • 59) 0.050 152 085 450 397 843 456 × 2 = 0 + 0.100 304 170 900 795 686 912;
  • 60) 0.100 304 170 900 795 686 912 × 2 = 0 + 0.200 608 341 801 591 373 824;
  • 61) 0.200 608 341 801 591 373 824 × 2 = 0 + 0.401 216 683 603 182 747 648;
  • 62) 0.401 216 683 603 182 747 648 × 2 = 0 + 0.802 433 367 206 365 495 296;
  • 63) 0.802 433 367 206 365 495 296 × 2 = 1 + 0.604 866 734 412 730 990 592;
  • 64) 0.604 866 734 412 730 990 592 × 2 = 1 + 0.209 733 468 825 461 981 184;
  • 65) 0.209 733 468 825 461 981 184 × 2 = 0 + 0.419 466 937 650 923 962 368;
  • 66) 0.419 466 937 650 923 962 368 × 2 = 0 + 0.838 933 875 301 847 924 736;
  • 67) 0.838 933 875 301 847 924 736 × 2 = 1 + 0.677 867 750 603 695 849 472;
  • 68) 0.677 867 750 603 695 849 472 × 2 = 1 + 0.355 735 501 207 391 698 944;
  • 69) 0.355 735 501 207 391 698 944 × 2 = 0 + 0.711 471 002 414 783 397 888;
  • 70) 0.711 471 002 414 783 397 888 × 2 = 1 + 0.422 942 004 829 566 795 776;
  • 71) 0.422 942 004 829 566 795 776 × 2 = 0 + 0.845 884 009 659 133 591 552;
  • 72) 0.845 884 009 659 133 591 552 × 2 = 1 + 0.691 768 019 318 267 183 104;
  • 73) 0.691 768 019 318 267 183 104 × 2 = 1 + 0.383 536 038 636 534 366 208;
  • 74) 0.383 536 038 636 534 366 208 × 2 = 0 + 0.767 072 077 273 068 732 416;
  • 75) 0.767 072 077 273 068 732 416 × 2 = 1 + 0.534 144 154 546 137 464 832;
  • 76) 0.534 144 154 546 137 464 832 × 2 = 1 + 0.068 288 309 092 274 929 664;
  • 77) 0.068 288 309 092 274 929 664 × 2 = 0 + 0.136 576 618 184 549 859 328;
  • 78) 0.136 576 618 184 549 859 328 × 2 = 0 + 0.273 153 236 369 099 718 656;
  • 79) 0.273 153 236 369 099 718 656 × 2 = 0 + 0.546 306 472 738 199 437 312;
  • 80) 0.546 306 472 738 199 437 312 × 2 = 1 + 0.092 612 945 476 398 874 624;
  • 81) 0.092 612 945 476 398 874 624 × 2 = 0 + 0.185 225 890 952 797 749 248;
  • 82) 0.185 225 890 952 797 749 248 × 2 = 0 + 0.370 451 781 905 595 498 496;
  • 83) 0.370 451 781 905 595 498 496 × 2 = 0 + 0.740 903 563 811 190 996 992;
  • 84) 0.740 903 563 811 190 996 992 × 2 = 1 + 0.481 807 127 622 381 993 984;
  • 85) 0.481 807 127 622 381 993 984 × 2 = 0 + 0.963 614 255 244 763 987 968;
  • 86) 0.963 614 255 244 763 987 968 × 2 = 1 + 0.927 228 510 489 527 975 936;
  • 87) 0.927 228 510 489 527 975 936 × 2 = 1 + 0.854 457 020 979 055 951 872;
  • 88) 0.854 457 020 979 055 951 872 × 2 = 1 + 0.708 914 041 958 111 903 744;
  • 89) 0.708 914 041 958 111 903 744 × 2 = 1 + 0.417 828 083 916 223 807 488;
  • 90) 0.417 828 083 916 223 807 488 × 2 = 0 + 0.835 656 167 832 447 614 976;
  • 91) 0.835 656 167 832 447 614 976 × 2 = 1 + 0.671 312 335 664 895 229 952;
  • 92) 0.671 312 335 664 895 229 952 × 2 = 1 + 0.342 624 671 329 790 459 904;
  • 93) 0.342 624 671 329 790 459 904 × 2 = 0 + 0.685 249 342 659 580 919 808;
  • 94) 0.685 249 342 659 580 919 808 × 2 = 1 + 0.370 498 685 319 161 839 616;
  • 95) 0.370 498 685 319 161 839 616 × 2 = 0 + 0.740 997 370 638 323 679 232;
  • 96) 0.740 997 370 638 323 679 232 × 2 = 1 + 0.481 994 741 276 647 358 464;
  • 97) 0.481 994 741 276 647 358 464 × 2 = 0 + 0.963 989 482 553 294 716 928;
  • 98) 0.963 989 482 553 294 716 928 × 2 = 1 + 0.927 978 965 106 589 433 856;
  • 99) 0.927 978 965 106 589 433 856 × 2 = 1 + 0.855 957 930 213 178 867 712;
  • 100) 0.855 957 930 213 178 867 712 × 2 = 1 + 0.711 915 860 426 357 735 424;
  • 101) 0.711 915 860 426 357 735 424 × 2 = 1 + 0.423 831 720 852 715 470 848;
  • 102) 0.423 831 720 852 715 470 848 × 2 = 0 + 0.847 663 441 705 430 941 696;
  • 103) 0.847 663 441 705 430 941 696 × 2 = 1 + 0.695 326 883 410 861 883 392;
  • 104) 0.695 326 883 410 861 883 392 × 2 = 1 + 0.390 653 766 821 723 766 784;
  • 105) 0.390 653 766 821 723 766 784 × 2 = 0 + 0.781 307 533 643 447 533 568;
  • 106) 0.781 307 533 643 447 533 568 × 2 = 1 + 0.562 615 067 286 895 067 136;
  • 107) 0.562 615 067 286 895 067 136 × 2 = 1 + 0.125 230 134 573 790 134 272;
  • 108) 0.125 230 134 573 790 134 272 × 2 = 0 + 0.250 460 269 147 580 268 544;
  • 109) 0.250 460 269 147 580 268 544 × 2 = 0 + 0.500 920 538 295 160 537 088;
  • 110) 0.500 920 538 295 160 537 088 × 2 = 1 + 0.001 841 076 590 321 074 176;
  • 111) 0.001 841 076 590 321 074 176 × 2 = 0 + 0.003 682 153 180 642 148 352;
  • 112) 0.003 682 153 180 642 148 352 × 2 = 0 + 0.007 364 306 361 284 296 704;
  • 113) 0.007 364 306 361 284 296 704 × 2 = 0 + 0.014 728 612 722 568 593 408;
  • 114) 0.014 728 612 722 568 593 408 × 2 = 0 + 0.029 457 225 445 137 186 816;
  • 115) 0.029 457 225 445 137 186 816 × 2 = 0 + 0.058 914 450 890 274 373 632;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


5. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 174(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0011 0011 0101 1011 0001 0001 0111 1011 0101 0111 1011 0110 0100 000(2)

6. Positive number before normalization:

0.000 000 000 000 000 000 174(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0011 0011 0101 1011 0001 0001 0111 1011 0101 0111 1011 0110 0100 000(2)

7. Normalize the binary representation of the number.

Shift the decimal mark 63 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 174(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0011 0011 0101 1011 0001 0001 0111 1011 0101 0111 1011 0110 0100 000(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0011 0011 0101 1011 0001 0001 0111 1011 0101 0111 1011 0110 0100 000(2) × 20 =


1.1001 1010 1101 1000 1000 1011 1101 1010 1011 1101 1011 0010 0000(2) × 2-63


8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 1 (a negative number)


Exponent (unadjusted): -63


Mantissa (not normalized):
1.1001 1010 1101 1000 1000 1011 1101 1010 1011 1101 1011 0010 0000


9. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-63 + 2(11-1) - 1 =


(-63 + 1 023)(10) =


960(10)


10. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 960 ÷ 2 = 480 + 0;
  • 480 ÷ 2 = 240 + 0;
  • 240 ÷ 2 = 120 + 0;
  • 120 ÷ 2 = 60 + 0;
  • 60 ÷ 2 = 30 + 0;
  • 30 ÷ 2 = 15 + 0;
  • 15 ÷ 2 = 7 + 1;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

11. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


960(10) =


011 1100 0000(2)


12. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 1001 1010 1101 1000 1000 1011 1101 1010 1011 1101 1011 0010 0000 =


1001 1010 1101 1000 1000 1011 1101 1010 1011 1101 1011 0010 0000


13. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
1 (a negative number)


Exponent (11 bits) =
011 1100 0000


Mantissa (52 bits) =
1001 1010 1101 1000 1000 1011 1101 1010 1011 1101 1011 0010 0000


Decimal number -0.000 000 000 000 000 000 174 converted to 64 bit double precision IEEE 754 binary floating point representation:

1 - 011 1100 0000 - 1001 1010 1101 1000 1000 1011 1101 1010 1011 1101 1011 0010 0000


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100