0.000 000 000 000 000 000 008 530 3 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 530 3(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 530 3(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 530 3.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 530 3 × 2 = 0 + 0.000 000 000 000 000 000 017 060 6;
  • 2) 0.000 000 000 000 000 000 017 060 6 × 2 = 0 + 0.000 000 000 000 000 000 034 121 2;
  • 3) 0.000 000 000 000 000 000 034 121 2 × 2 = 0 + 0.000 000 000 000 000 000 068 242 4;
  • 4) 0.000 000 000 000 000 000 068 242 4 × 2 = 0 + 0.000 000 000 000 000 000 136 484 8;
  • 5) 0.000 000 000 000 000 000 136 484 8 × 2 = 0 + 0.000 000 000 000 000 000 272 969 6;
  • 6) 0.000 000 000 000 000 000 272 969 6 × 2 = 0 + 0.000 000 000 000 000 000 545 939 2;
  • 7) 0.000 000 000 000 000 000 545 939 2 × 2 = 0 + 0.000 000 000 000 000 001 091 878 4;
  • 8) 0.000 000 000 000 000 001 091 878 4 × 2 = 0 + 0.000 000 000 000 000 002 183 756 8;
  • 9) 0.000 000 000 000 000 002 183 756 8 × 2 = 0 + 0.000 000 000 000 000 004 367 513 6;
  • 10) 0.000 000 000 000 000 004 367 513 6 × 2 = 0 + 0.000 000 000 000 000 008 735 027 2;
  • 11) 0.000 000 000 000 000 008 735 027 2 × 2 = 0 + 0.000 000 000 000 000 017 470 054 4;
  • 12) 0.000 000 000 000 000 017 470 054 4 × 2 = 0 + 0.000 000 000 000 000 034 940 108 8;
  • 13) 0.000 000 000 000 000 034 940 108 8 × 2 = 0 + 0.000 000 000 000 000 069 880 217 6;
  • 14) 0.000 000 000 000 000 069 880 217 6 × 2 = 0 + 0.000 000 000 000 000 139 760 435 2;
  • 15) 0.000 000 000 000 000 139 760 435 2 × 2 = 0 + 0.000 000 000 000 000 279 520 870 4;
  • 16) 0.000 000 000 000 000 279 520 870 4 × 2 = 0 + 0.000 000 000 000 000 559 041 740 8;
  • 17) 0.000 000 000 000 000 559 041 740 8 × 2 = 0 + 0.000 000 000 000 001 118 083 481 6;
  • 18) 0.000 000 000 000 001 118 083 481 6 × 2 = 0 + 0.000 000 000 000 002 236 166 963 2;
  • 19) 0.000 000 000 000 002 236 166 963 2 × 2 = 0 + 0.000 000 000 000 004 472 333 926 4;
  • 20) 0.000 000 000 000 004 472 333 926 4 × 2 = 0 + 0.000 000 000 000 008 944 667 852 8;
  • 21) 0.000 000 000 000 008 944 667 852 8 × 2 = 0 + 0.000 000 000 000 017 889 335 705 6;
  • 22) 0.000 000 000 000 017 889 335 705 6 × 2 = 0 + 0.000 000 000 000 035 778 671 411 2;
  • 23) 0.000 000 000 000 035 778 671 411 2 × 2 = 0 + 0.000 000 000 000 071 557 342 822 4;
  • 24) 0.000 000 000 000 071 557 342 822 4 × 2 = 0 + 0.000 000 000 000 143 114 685 644 8;
  • 25) 0.000 000 000 000 143 114 685 644 8 × 2 = 0 + 0.000 000 000 000 286 229 371 289 6;
  • 26) 0.000 000 000 000 286 229 371 289 6 × 2 = 0 + 0.000 000 000 000 572 458 742 579 2;
  • 27) 0.000 000 000 000 572 458 742 579 2 × 2 = 0 + 0.000 000 000 001 144 917 485 158 4;
  • 28) 0.000 000 000 001 144 917 485 158 4 × 2 = 0 + 0.000 000 000 002 289 834 970 316 8;
  • 29) 0.000 000 000 002 289 834 970 316 8 × 2 = 0 + 0.000 000 000 004 579 669 940 633 6;
  • 30) 0.000 000 000 004 579 669 940 633 6 × 2 = 0 + 0.000 000 000 009 159 339 881 267 2;
  • 31) 0.000 000 000 009 159 339 881 267 2 × 2 = 0 + 0.000 000 000 018 318 679 762 534 4;
  • 32) 0.000 000 000 018 318 679 762 534 4 × 2 = 0 + 0.000 000 000 036 637 359 525 068 8;
  • 33) 0.000 000 000 036 637 359 525 068 8 × 2 = 0 + 0.000 000 000 073 274 719 050 137 6;
  • 34) 0.000 000 000 073 274 719 050 137 6 × 2 = 0 + 0.000 000 000 146 549 438 100 275 2;
  • 35) 0.000 000 000 146 549 438 100 275 2 × 2 = 0 + 0.000 000 000 293 098 876 200 550 4;
  • 36) 0.000 000 000 293 098 876 200 550 4 × 2 = 0 + 0.000 000 000 586 197 752 401 100 8;
  • 37) 0.000 000 000 586 197 752 401 100 8 × 2 = 0 + 0.000 000 001 172 395 504 802 201 6;
  • 38) 0.000 000 001 172 395 504 802 201 6 × 2 = 0 + 0.000 000 002 344 791 009 604 403 2;
  • 39) 0.000 000 002 344 791 009 604 403 2 × 2 = 0 + 0.000 000 004 689 582 019 208 806 4;
  • 40) 0.000 000 004 689 582 019 208 806 4 × 2 = 0 + 0.000 000 009 379 164 038 417 612 8;
  • 41) 0.000 000 009 379 164 038 417 612 8 × 2 = 0 + 0.000 000 018 758 328 076 835 225 6;
  • 42) 0.000 000 018 758 328 076 835 225 6 × 2 = 0 + 0.000 000 037 516 656 153 670 451 2;
  • 43) 0.000 000 037 516 656 153 670 451 2 × 2 = 0 + 0.000 000 075 033 312 307 340 902 4;
  • 44) 0.000 000 075 033 312 307 340 902 4 × 2 = 0 + 0.000 000 150 066 624 614 681 804 8;
  • 45) 0.000 000 150 066 624 614 681 804 8 × 2 = 0 + 0.000 000 300 133 249 229 363 609 6;
  • 46) 0.000 000 300 133 249 229 363 609 6 × 2 = 0 + 0.000 000 600 266 498 458 727 219 2;
  • 47) 0.000 000 600 266 498 458 727 219 2 × 2 = 0 + 0.000 001 200 532 996 917 454 438 4;
  • 48) 0.000 001 200 532 996 917 454 438 4 × 2 = 0 + 0.000 002 401 065 993 834 908 876 8;
  • 49) 0.000 002 401 065 993 834 908 876 8 × 2 = 0 + 0.000 004 802 131 987 669 817 753 6;
  • 50) 0.000 004 802 131 987 669 817 753 6 × 2 = 0 + 0.000 009 604 263 975 339 635 507 2;
  • 51) 0.000 009 604 263 975 339 635 507 2 × 2 = 0 + 0.000 019 208 527 950 679 271 014 4;
  • 52) 0.000 019 208 527 950 679 271 014 4 × 2 = 0 + 0.000 038 417 055 901 358 542 028 8;
  • 53) 0.000 038 417 055 901 358 542 028 8 × 2 = 0 + 0.000 076 834 111 802 717 084 057 6;
  • 54) 0.000 076 834 111 802 717 084 057 6 × 2 = 0 + 0.000 153 668 223 605 434 168 115 2;
  • 55) 0.000 153 668 223 605 434 168 115 2 × 2 = 0 + 0.000 307 336 447 210 868 336 230 4;
  • 56) 0.000 307 336 447 210 868 336 230 4 × 2 = 0 + 0.000 614 672 894 421 736 672 460 8;
  • 57) 0.000 614 672 894 421 736 672 460 8 × 2 = 0 + 0.001 229 345 788 843 473 344 921 6;
  • 58) 0.001 229 345 788 843 473 344 921 6 × 2 = 0 + 0.002 458 691 577 686 946 689 843 2;
  • 59) 0.002 458 691 577 686 946 689 843 2 × 2 = 0 + 0.004 917 383 155 373 893 379 686 4;
  • 60) 0.004 917 383 155 373 893 379 686 4 × 2 = 0 + 0.009 834 766 310 747 786 759 372 8;
  • 61) 0.009 834 766 310 747 786 759 372 8 × 2 = 0 + 0.019 669 532 621 495 573 518 745 6;
  • 62) 0.019 669 532 621 495 573 518 745 6 × 2 = 0 + 0.039 339 065 242 991 147 037 491 2;
  • 63) 0.039 339 065 242 991 147 037 491 2 × 2 = 0 + 0.078 678 130 485 982 294 074 982 4;
  • 64) 0.078 678 130 485 982 294 074 982 4 × 2 = 0 + 0.157 356 260 971 964 588 149 964 8;
  • 65) 0.157 356 260 971 964 588 149 964 8 × 2 = 0 + 0.314 712 521 943 929 176 299 929 6;
  • 66) 0.314 712 521 943 929 176 299 929 6 × 2 = 0 + 0.629 425 043 887 858 352 599 859 2;
  • 67) 0.629 425 043 887 858 352 599 859 2 × 2 = 1 + 0.258 850 087 775 716 705 199 718 4;
  • 68) 0.258 850 087 775 716 705 199 718 4 × 2 = 0 + 0.517 700 175 551 433 410 399 436 8;
  • 69) 0.517 700 175 551 433 410 399 436 8 × 2 = 1 + 0.035 400 351 102 866 820 798 873 6;
  • 70) 0.035 400 351 102 866 820 798 873 6 × 2 = 0 + 0.070 800 702 205 733 641 597 747 2;
  • 71) 0.070 800 702 205 733 641 597 747 2 × 2 = 0 + 0.141 601 404 411 467 283 195 494 4;
  • 72) 0.141 601 404 411 467 283 195 494 4 × 2 = 0 + 0.283 202 808 822 934 566 390 988 8;
  • 73) 0.283 202 808 822 934 566 390 988 8 × 2 = 0 + 0.566 405 617 645 869 132 781 977 6;
  • 74) 0.566 405 617 645 869 132 781 977 6 × 2 = 1 + 0.132 811 235 291 738 265 563 955 2;
  • 75) 0.132 811 235 291 738 265 563 955 2 × 2 = 0 + 0.265 622 470 583 476 531 127 910 4;
  • 76) 0.265 622 470 583 476 531 127 910 4 × 2 = 0 + 0.531 244 941 166 953 062 255 820 8;
  • 77) 0.531 244 941 166 953 062 255 820 8 × 2 = 1 + 0.062 489 882 333 906 124 511 641 6;
  • 78) 0.062 489 882 333 906 124 511 641 6 × 2 = 0 + 0.124 979 764 667 812 249 023 283 2;
  • 79) 0.124 979 764 667 812 249 023 283 2 × 2 = 0 + 0.249 959 529 335 624 498 046 566 4;
  • 80) 0.249 959 529 335 624 498 046 566 4 × 2 = 0 + 0.499 919 058 671 248 996 093 132 8;
  • 81) 0.499 919 058 671 248 996 093 132 8 × 2 = 0 + 0.999 838 117 342 497 992 186 265 6;
  • 82) 0.999 838 117 342 497 992 186 265 6 × 2 = 1 + 0.999 676 234 684 995 984 372 531 2;
  • 83) 0.999 676 234 684 995 984 372 531 2 × 2 = 1 + 0.999 352 469 369 991 968 745 062 4;
  • 84) 0.999 352 469 369 991 968 745 062 4 × 2 = 1 + 0.998 704 938 739 983 937 490 124 8;
  • 85) 0.998 704 938 739 983 937 490 124 8 × 2 = 1 + 0.997 409 877 479 967 874 980 249 6;
  • 86) 0.997 409 877 479 967 874 980 249 6 × 2 = 1 + 0.994 819 754 959 935 749 960 499 2;
  • 87) 0.994 819 754 959 935 749 960 499 2 × 2 = 1 + 0.989 639 509 919 871 499 920 998 4;
  • 88) 0.989 639 509 919 871 499 920 998 4 × 2 = 1 + 0.979 279 019 839 742 999 841 996 8;
  • 89) 0.979 279 019 839 742 999 841 996 8 × 2 = 1 + 0.958 558 039 679 485 999 683 993 6;
  • 90) 0.958 558 039 679 485 999 683 993 6 × 2 = 1 + 0.917 116 079 358 971 999 367 987 2;
  • 91) 0.917 116 079 358 971 999 367 987 2 × 2 = 1 + 0.834 232 158 717 943 998 735 974 4;
  • 92) 0.834 232 158 717 943 998 735 974 4 × 2 = 1 + 0.668 464 317 435 887 997 471 948 8;
  • 93) 0.668 464 317 435 887 997 471 948 8 × 2 = 1 + 0.336 928 634 871 775 994 943 897 6;
  • 94) 0.336 928 634 871 775 994 943 897 6 × 2 = 0 + 0.673 857 269 743 551 989 887 795 2;
  • 95) 0.673 857 269 743 551 989 887 795 2 × 2 = 1 + 0.347 714 539 487 103 979 775 590 4;
  • 96) 0.347 714 539 487 103 979 775 590 4 × 2 = 0 + 0.695 429 078 974 207 959 551 180 8;
  • 97) 0.695 429 078 974 207 959 551 180 8 × 2 = 1 + 0.390 858 157 948 415 919 102 361 6;
  • 98) 0.390 858 157 948 415 919 102 361 6 × 2 = 0 + 0.781 716 315 896 831 838 204 723 2;
  • 99) 0.781 716 315 896 831 838 204 723 2 × 2 = 1 + 0.563 432 631 793 663 676 409 446 4;
  • 100) 0.563 432 631 793 663 676 409 446 4 × 2 = 1 + 0.126 865 263 587 327 352 818 892 8;
  • 101) 0.126 865 263 587 327 352 818 892 8 × 2 = 0 + 0.253 730 527 174 654 705 637 785 6;
  • 102) 0.253 730 527 174 654 705 637 785 6 × 2 = 0 + 0.507 461 054 349 309 411 275 571 2;
  • 103) 0.507 461 054 349 309 411 275 571 2 × 2 = 1 + 0.014 922 108 698 618 822 551 142 4;
  • 104) 0.014 922 108 698 618 822 551 142 4 × 2 = 0 + 0.029 844 217 397 237 645 102 284 8;
  • 105) 0.029 844 217 397 237 645 102 284 8 × 2 = 0 + 0.059 688 434 794 475 290 204 569 6;
  • 106) 0.059 688 434 794 475 290 204 569 6 × 2 = 0 + 0.119 376 869 588 950 580 409 139 2;
  • 107) 0.119 376 869 588 950 580 409 139 2 × 2 = 0 + 0.238 753 739 177 901 160 818 278 4;
  • 108) 0.238 753 739 177 901 160 818 278 4 × 2 = 0 + 0.477 507 478 355 802 321 636 556 8;
  • 109) 0.477 507 478 355 802 321 636 556 8 × 2 = 0 + 0.955 014 956 711 604 643 273 113 6;
  • 110) 0.955 014 956 711 604 643 273 113 6 × 2 = 1 + 0.910 029 913 423 209 286 546 227 2;
  • 111) 0.910 029 913 423 209 286 546 227 2 × 2 = 1 + 0.820 059 826 846 418 573 092 454 4;
  • 112) 0.820 059 826 846 418 573 092 454 4 × 2 = 1 + 0.640 119 653 692 837 146 184 908 8;
  • 113) 0.640 119 653 692 837 146 184 908 8 × 2 = 1 + 0.280 239 307 385 674 292 369 817 6;
  • 114) 0.280 239 307 385 674 292 369 817 6 × 2 = 0 + 0.560 478 614 771 348 584 739 635 2;
  • 115) 0.560 478 614 771 348 584 739 635 2 × 2 = 1 + 0.120 957 229 542 697 169 479 270 4;
  • 116) 0.120 957 229 542 697 169 479 270 4 × 2 = 0 + 0.241 914 459 085 394 338 958 540 8;
  • 117) 0.241 914 459 085 394 338 958 540 8 × 2 = 0 + 0.483 828 918 170 788 677 917 081 6;
  • 118) 0.483 828 918 170 788 677 917 081 6 × 2 = 0 + 0.967 657 836 341 577 355 834 163 2;
  • 119) 0.967 657 836 341 577 355 834 163 2 × 2 = 1 + 0.935 315 672 683 154 711 668 326 4;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 530 3(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1000 0111 1111 1111 1010 1011 0010 0000 0111 1010 001(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 530 3(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1000 0111 1111 1111 1010 1011 0010 0000 0111 1010 001(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 530 3(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1000 0111 1111 1111 1010 1011 0010 0000 0111 1010 001(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1000 0111 1111 1111 1010 1011 0010 0000 0111 1010 001(2) × 20 =


1.0100 0010 0100 0011 1111 1111 1101 0101 1001 0000 0011 1101 0001(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 0100 0011 1111 1111 1101 0101 1001 0000 0011 1101 0001


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 0100 0011 1111 1111 1101 0101 1001 0000 0011 1101 0001 =


0100 0010 0100 0011 1111 1111 1101 0101 1001 0000 0011 1101 0001


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 0100 0011 1111 1111 1101 0101 1001 0000 0011 1101 0001


Decimal number 0.000 000 000 000 000 000 008 530 3 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 0100 0011 1111 1111 1101 0101 1001 0000 0011 1101 0001


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100