35 248 198 783 801 730 000 000 000 000 000 000 270 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 35 248 198 783 801 730 000 000 000 000 000 000 270(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
35 248 198 783 801 730 000 000 000 000 000 000 270(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 35 248 198 783 801 730 000 000 000 000 000 000 270 ÷ 2 = 17 624 099 391 900 865 000 000 000 000 000 000 135 + 0;
  • 17 624 099 391 900 865 000 000 000 000 000 000 135 ÷ 2 = 8 812 049 695 950 432 500 000 000 000 000 000 067 + 1;
  • 8 812 049 695 950 432 500 000 000 000 000 000 067 ÷ 2 = 4 406 024 847 975 216 250 000 000 000 000 000 033 + 1;
  • 4 406 024 847 975 216 250 000 000 000 000 000 033 ÷ 2 = 2 203 012 423 987 608 125 000 000 000 000 000 016 + 1;
  • 2 203 012 423 987 608 125 000 000 000 000 000 016 ÷ 2 = 1 101 506 211 993 804 062 500 000 000 000 000 008 + 0;
  • 1 101 506 211 993 804 062 500 000 000 000 000 008 ÷ 2 = 550 753 105 996 902 031 250 000 000 000 000 004 + 0;
  • 550 753 105 996 902 031 250 000 000 000 000 004 ÷ 2 = 275 376 552 998 451 015 625 000 000 000 000 002 + 0;
  • 275 376 552 998 451 015 625 000 000 000 000 002 ÷ 2 = 137 688 276 499 225 507 812 500 000 000 000 001 + 0;
  • 137 688 276 499 225 507 812 500 000 000 000 001 ÷ 2 = 68 844 138 249 612 753 906 250 000 000 000 000 + 1;
  • 68 844 138 249 612 753 906 250 000 000 000 000 ÷ 2 = 34 422 069 124 806 376 953 125 000 000 000 000 + 0;
  • 34 422 069 124 806 376 953 125 000 000 000 000 ÷ 2 = 17 211 034 562 403 188 476 562 500 000 000 000 + 0;
  • 17 211 034 562 403 188 476 562 500 000 000 000 ÷ 2 = 8 605 517 281 201 594 238 281 250 000 000 000 + 0;
  • 8 605 517 281 201 594 238 281 250 000 000 000 ÷ 2 = 4 302 758 640 600 797 119 140 625 000 000 000 + 0;
  • 4 302 758 640 600 797 119 140 625 000 000 000 ÷ 2 = 2 151 379 320 300 398 559 570 312 500 000 000 + 0;
  • 2 151 379 320 300 398 559 570 312 500 000 000 ÷ 2 = 1 075 689 660 150 199 279 785 156 250 000 000 + 0;
  • 1 075 689 660 150 199 279 785 156 250 000 000 ÷ 2 = 537 844 830 075 099 639 892 578 125 000 000 + 0;
  • 537 844 830 075 099 639 892 578 125 000 000 ÷ 2 = 268 922 415 037 549 819 946 289 062 500 000 + 0;
  • 268 922 415 037 549 819 946 289 062 500 000 ÷ 2 = 134 461 207 518 774 909 973 144 531 250 000 + 0;
  • 134 461 207 518 774 909 973 144 531 250 000 ÷ 2 = 67 230 603 759 387 454 986 572 265 625 000 + 0;
  • 67 230 603 759 387 454 986 572 265 625 000 ÷ 2 = 33 615 301 879 693 727 493 286 132 812 500 + 0;
  • 33 615 301 879 693 727 493 286 132 812 500 ÷ 2 = 16 807 650 939 846 863 746 643 066 406 250 + 0;
  • 16 807 650 939 846 863 746 643 066 406 250 ÷ 2 = 8 403 825 469 923 431 873 321 533 203 125 + 0;
  • 8 403 825 469 923 431 873 321 533 203 125 ÷ 2 = 4 201 912 734 961 715 936 660 766 601 562 + 1;
  • 4 201 912 734 961 715 936 660 766 601 562 ÷ 2 = 2 100 956 367 480 857 968 330 383 300 781 + 0;
  • 2 100 956 367 480 857 968 330 383 300 781 ÷ 2 = 1 050 478 183 740 428 984 165 191 650 390 + 1;
  • 1 050 478 183 740 428 984 165 191 650 390 ÷ 2 = 525 239 091 870 214 492 082 595 825 195 + 0;
  • 525 239 091 870 214 492 082 595 825 195 ÷ 2 = 262 619 545 935 107 246 041 297 912 597 + 1;
  • 262 619 545 935 107 246 041 297 912 597 ÷ 2 = 131 309 772 967 553 623 020 648 956 298 + 1;
  • 131 309 772 967 553 623 020 648 956 298 ÷ 2 = 65 654 886 483 776 811 510 324 478 149 + 0;
  • 65 654 886 483 776 811 510 324 478 149 ÷ 2 = 32 827 443 241 888 405 755 162 239 074 + 1;
  • 32 827 443 241 888 405 755 162 239 074 ÷ 2 = 16 413 721 620 944 202 877 581 119 537 + 0;
  • 16 413 721 620 944 202 877 581 119 537 ÷ 2 = 8 206 860 810 472 101 438 790 559 768 + 1;
  • 8 206 860 810 472 101 438 790 559 768 ÷ 2 = 4 103 430 405 236 050 719 395 279 884 + 0;
  • 4 103 430 405 236 050 719 395 279 884 ÷ 2 = 2 051 715 202 618 025 359 697 639 942 + 0;
  • 2 051 715 202 618 025 359 697 639 942 ÷ 2 = 1 025 857 601 309 012 679 848 819 971 + 0;
  • 1 025 857 601 309 012 679 848 819 971 ÷ 2 = 512 928 800 654 506 339 924 409 985 + 1;
  • 512 928 800 654 506 339 924 409 985 ÷ 2 = 256 464 400 327 253 169 962 204 992 + 1;
  • 256 464 400 327 253 169 962 204 992 ÷ 2 = 128 232 200 163 626 584 981 102 496 + 0;
  • 128 232 200 163 626 584 981 102 496 ÷ 2 = 64 116 100 081 813 292 490 551 248 + 0;
  • 64 116 100 081 813 292 490 551 248 ÷ 2 = 32 058 050 040 906 646 245 275 624 + 0;
  • 32 058 050 040 906 646 245 275 624 ÷ 2 = 16 029 025 020 453 323 122 637 812 + 0;
  • 16 029 025 020 453 323 122 637 812 ÷ 2 = 8 014 512 510 226 661 561 318 906 + 0;
  • 8 014 512 510 226 661 561 318 906 ÷ 2 = 4 007 256 255 113 330 780 659 453 + 0;
  • 4 007 256 255 113 330 780 659 453 ÷ 2 = 2 003 628 127 556 665 390 329 726 + 1;
  • 2 003 628 127 556 665 390 329 726 ÷ 2 = 1 001 814 063 778 332 695 164 863 + 0;
  • 1 001 814 063 778 332 695 164 863 ÷ 2 = 500 907 031 889 166 347 582 431 + 1;
  • 500 907 031 889 166 347 582 431 ÷ 2 = 250 453 515 944 583 173 791 215 + 1;
  • 250 453 515 944 583 173 791 215 ÷ 2 = 125 226 757 972 291 586 895 607 + 1;
  • 125 226 757 972 291 586 895 607 ÷ 2 = 62 613 378 986 145 793 447 803 + 1;
  • 62 613 378 986 145 793 447 803 ÷ 2 = 31 306 689 493 072 896 723 901 + 1;
  • 31 306 689 493 072 896 723 901 ÷ 2 = 15 653 344 746 536 448 361 950 + 1;
  • 15 653 344 746 536 448 361 950 ÷ 2 = 7 826 672 373 268 224 180 975 + 0;
  • 7 826 672 373 268 224 180 975 ÷ 2 = 3 913 336 186 634 112 090 487 + 1;
  • 3 913 336 186 634 112 090 487 ÷ 2 = 1 956 668 093 317 056 045 243 + 1;
  • 1 956 668 093 317 056 045 243 ÷ 2 = 978 334 046 658 528 022 621 + 1;
  • 978 334 046 658 528 022 621 ÷ 2 = 489 167 023 329 264 011 310 + 1;
  • 489 167 023 329 264 011 310 ÷ 2 = 244 583 511 664 632 005 655 + 0;
  • 244 583 511 664 632 005 655 ÷ 2 = 122 291 755 832 316 002 827 + 1;
  • 122 291 755 832 316 002 827 ÷ 2 = 61 145 877 916 158 001 413 + 1;
  • 61 145 877 916 158 001 413 ÷ 2 = 30 572 938 958 079 000 706 + 1;
  • 30 572 938 958 079 000 706 ÷ 2 = 15 286 469 479 039 500 353 + 0;
  • 15 286 469 479 039 500 353 ÷ 2 = 7 643 234 739 519 750 176 + 1;
  • 7 643 234 739 519 750 176 ÷ 2 = 3 821 617 369 759 875 088 + 0;
  • 3 821 617 369 759 875 088 ÷ 2 = 1 910 808 684 879 937 544 + 0;
  • 1 910 808 684 879 937 544 ÷ 2 = 955 404 342 439 968 772 + 0;
  • 955 404 342 439 968 772 ÷ 2 = 477 702 171 219 984 386 + 0;
  • 477 702 171 219 984 386 ÷ 2 = 238 851 085 609 992 193 + 0;
  • 238 851 085 609 992 193 ÷ 2 = 119 425 542 804 996 096 + 1;
  • 119 425 542 804 996 096 ÷ 2 = 59 712 771 402 498 048 + 0;
  • 59 712 771 402 498 048 ÷ 2 = 29 856 385 701 249 024 + 0;
  • 29 856 385 701 249 024 ÷ 2 = 14 928 192 850 624 512 + 0;
  • 14 928 192 850 624 512 ÷ 2 = 7 464 096 425 312 256 + 0;
  • 7 464 096 425 312 256 ÷ 2 = 3 732 048 212 656 128 + 0;
  • 3 732 048 212 656 128 ÷ 2 = 1 866 024 106 328 064 + 0;
  • 1 866 024 106 328 064 ÷ 2 = 933 012 053 164 032 + 0;
  • 933 012 053 164 032 ÷ 2 = 466 506 026 582 016 + 0;
  • 466 506 026 582 016 ÷ 2 = 233 253 013 291 008 + 0;
  • 233 253 013 291 008 ÷ 2 = 116 626 506 645 504 + 0;
  • 116 626 506 645 504 ÷ 2 = 58 313 253 322 752 + 0;
  • 58 313 253 322 752 ÷ 2 = 29 156 626 661 376 + 0;
  • 29 156 626 661 376 ÷ 2 = 14 578 313 330 688 + 0;
  • 14 578 313 330 688 ÷ 2 = 7 289 156 665 344 + 0;
  • 7 289 156 665 344 ÷ 2 = 3 644 578 332 672 + 0;
  • 3 644 578 332 672 ÷ 2 = 1 822 289 166 336 + 0;
  • 1 822 289 166 336 ÷ 2 = 911 144 583 168 + 0;
  • 911 144 583 168 ÷ 2 = 455 572 291 584 + 0;
  • 455 572 291 584 ÷ 2 = 227 786 145 792 + 0;
  • 227 786 145 792 ÷ 2 = 113 893 072 896 + 0;
  • 113 893 072 896 ÷ 2 = 56 946 536 448 + 0;
  • 56 946 536 448 ÷ 2 = 28 473 268 224 + 0;
  • 28 473 268 224 ÷ 2 = 14 236 634 112 + 0;
  • 14 236 634 112 ÷ 2 = 7 118 317 056 + 0;
  • 7 118 317 056 ÷ 2 = 3 559 158 528 + 0;
  • 3 559 158 528 ÷ 2 = 1 779 579 264 + 0;
  • 1 779 579 264 ÷ 2 = 889 789 632 + 0;
  • 889 789 632 ÷ 2 = 444 894 816 + 0;
  • 444 894 816 ÷ 2 = 222 447 408 + 0;
  • 222 447 408 ÷ 2 = 111 223 704 + 0;
  • 111 223 704 ÷ 2 = 55 611 852 + 0;
  • 55 611 852 ÷ 2 = 27 805 926 + 0;
  • 27 805 926 ÷ 2 = 13 902 963 + 0;
  • 13 902 963 ÷ 2 = 6 951 481 + 1;
  • 6 951 481 ÷ 2 = 3 475 740 + 1;
  • 3 475 740 ÷ 2 = 1 737 870 + 0;
  • 1 737 870 ÷ 2 = 868 935 + 0;
  • 868 935 ÷ 2 = 434 467 + 1;
  • 434 467 ÷ 2 = 217 233 + 1;
  • 217 233 ÷ 2 = 108 616 + 1;
  • 108 616 ÷ 2 = 54 308 + 0;
  • 54 308 ÷ 2 = 27 154 + 0;
  • 27 154 ÷ 2 = 13 577 + 0;
  • 13 577 ÷ 2 = 6 788 + 1;
  • 6 788 ÷ 2 = 3 394 + 0;
  • 3 394 ÷ 2 = 1 697 + 0;
  • 1 697 ÷ 2 = 848 + 1;
  • 848 ÷ 2 = 424 + 0;
  • 424 ÷ 2 = 212 + 0;
  • 212 ÷ 2 = 106 + 0;
  • 106 ÷ 2 = 53 + 0;
  • 53 ÷ 2 = 26 + 1;
  • 26 ÷ 2 = 13 + 0;
  • 13 ÷ 2 = 6 + 1;
  • 6 ÷ 2 = 3 + 0;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

2. Construct the base 2 representation of the positive number.

Take all the remainders starting from the bottom of the list constructed above.

35 248 198 783 801 730 000 000 000 000 000 000 270(10) =


1 1010 1000 0100 1000 1110 0110 0000 0000 0000 0000 0000 0000 0000 0000 1000 0010 1110 1111 0111 1110 1000 0001 1000 1010 1101 0100 0000 0000 0001 0000 1110(2)


3. Normalize the binary representation of the number.

Shift the decimal mark 124 positions to the left, so that only one non zero digit remains to the left of it:


35 248 198 783 801 730 000 000 000 000 000 000 270(10) =


1 1010 1000 0100 1000 1110 0110 0000 0000 0000 0000 0000 0000 0000 0000 1000 0010 1110 1111 0111 1110 1000 0001 1000 1010 1101 0100 0000 0000 0001 0000 1110(2) =


1 1010 1000 0100 1000 1110 0110 0000 0000 0000 0000 0000 0000 0000 0000 1000 0010 1110 1111 0111 1110 1000 0001 1000 1010 1101 0100 0000 0000 0001 0000 1110(2) × 20 =


1.1010 1000 0100 1000 1110 0110 0000 0000 0000 0000 0000 0000 0000 0000 1000 0010 1110 1111 0111 1110 1000 0001 1000 1010 1101 0100 0000 0000 0001 0000 1110(2) × 2124


4. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): 124


Mantissa (not normalized):
1.1010 1000 0100 1000 1110 0110 0000 0000 0000 0000 0000 0000 0000 0000 1000 0010 1110 1111 0111 1110 1000 0001 1000 1010 1101 0100 0000 0000 0001 0000 1110


5. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


124 + 2(11-1) - 1 =


(124 + 1 023)(10) =


1 147(10)


6. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 1 147 ÷ 2 = 573 + 1;
  • 573 ÷ 2 = 286 + 1;
  • 286 ÷ 2 = 143 + 0;
  • 143 ÷ 2 = 71 + 1;
  • 71 ÷ 2 = 35 + 1;
  • 35 ÷ 2 = 17 + 1;
  • 17 ÷ 2 = 8 + 1;
  • 8 ÷ 2 = 4 + 0;
  • 4 ÷ 2 = 2 + 0;
  • 2 ÷ 2 = 1 + 0;
  • 1 ÷ 2 = 0 + 1;

7. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


1147(10) =


100 0111 1011(2)


8. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, by removing the excess bits, from the right (if any of the excess bits is set on 1, we are losing precision...).


Mantissa (normalized) =


1. 1010 1000 0100 1000 1110 0110 0000 0000 0000 0000 0000 0000 0000 0000 1000 0010 1110 1111 0111 1110 1000 0001 1000 1010 1101 0100 0000 0000 0001 0000 1110 =


1010 1000 0100 1000 1110 0110 0000 0000 0000 0000 0000 0000 0000


9. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
100 0111 1011


Mantissa (52 bits) =
1010 1000 0100 1000 1110 0110 0000 0000 0000 0000 0000 0000 0000


Decimal number 35 248 198 783 801 730 000 000 000 000 000 000 270 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 100 0111 1011 - 1010 1000 0100 1000 1110 0110 0000 0000 0000 0000 0000 0000 0000


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100