0.000 000 000 000 000 000 008 567 7 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 567 7(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 567 7(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 567 7.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 567 7 × 2 = 0 + 0.000 000 000 000 000 000 017 135 4;
  • 2) 0.000 000 000 000 000 000 017 135 4 × 2 = 0 + 0.000 000 000 000 000 000 034 270 8;
  • 3) 0.000 000 000 000 000 000 034 270 8 × 2 = 0 + 0.000 000 000 000 000 000 068 541 6;
  • 4) 0.000 000 000 000 000 000 068 541 6 × 2 = 0 + 0.000 000 000 000 000 000 137 083 2;
  • 5) 0.000 000 000 000 000 000 137 083 2 × 2 = 0 + 0.000 000 000 000 000 000 274 166 4;
  • 6) 0.000 000 000 000 000 000 274 166 4 × 2 = 0 + 0.000 000 000 000 000 000 548 332 8;
  • 7) 0.000 000 000 000 000 000 548 332 8 × 2 = 0 + 0.000 000 000 000 000 001 096 665 6;
  • 8) 0.000 000 000 000 000 001 096 665 6 × 2 = 0 + 0.000 000 000 000 000 002 193 331 2;
  • 9) 0.000 000 000 000 000 002 193 331 2 × 2 = 0 + 0.000 000 000 000 000 004 386 662 4;
  • 10) 0.000 000 000 000 000 004 386 662 4 × 2 = 0 + 0.000 000 000 000 000 008 773 324 8;
  • 11) 0.000 000 000 000 000 008 773 324 8 × 2 = 0 + 0.000 000 000 000 000 017 546 649 6;
  • 12) 0.000 000 000 000 000 017 546 649 6 × 2 = 0 + 0.000 000 000 000 000 035 093 299 2;
  • 13) 0.000 000 000 000 000 035 093 299 2 × 2 = 0 + 0.000 000 000 000 000 070 186 598 4;
  • 14) 0.000 000 000 000 000 070 186 598 4 × 2 = 0 + 0.000 000 000 000 000 140 373 196 8;
  • 15) 0.000 000 000 000 000 140 373 196 8 × 2 = 0 + 0.000 000 000 000 000 280 746 393 6;
  • 16) 0.000 000 000 000 000 280 746 393 6 × 2 = 0 + 0.000 000 000 000 000 561 492 787 2;
  • 17) 0.000 000 000 000 000 561 492 787 2 × 2 = 0 + 0.000 000 000 000 001 122 985 574 4;
  • 18) 0.000 000 000 000 001 122 985 574 4 × 2 = 0 + 0.000 000 000 000 002 245 971 148 8;
  • 19) 0.000 000 000 000 002 245 971 148 8 × 2 = 0 + 0.000 000 000 000 004 491 942 297 6;
  • 20) 0.000 000 000 000 004 491 942 297 6 × 2 = 0 + 0.000 000 000 000 008 983 884 595 2;
  • 21) 0.000 000 000 000 008 983 884 595 2 × 2 = 0 + 0.000 000 000 000 017 967 769 190 4;
  • 22) 0.000 000 000 000 017 967 769 190 4 × 2 = 0 + 0.000 000 000 000 035 935 538 380 8;
  • 23) 0.000 000 000 000 035 935 538 380 8 × 2 = 0 + 0.000 000 000 000 071 871 076 761 6;
  • 24) 0.000 000 000 000 071 871 076 761 6 × 2 = 0 + 0.000 000 000 000 143 742 153 523 2;
  • 25) 0.000 000 000 000 143 742 153 523 2 × 2 = 0 + 0.000 000 000 000 287 484 307 046 4;
  • 26) 0.000 000 000 000 287 484 307 046 4 × 2 = 0 + 0.000 000 000 000 574 968 614 092 8;
  • 27) 0.000 000 000 000 574 968 614 092 8 × 2 = 0 + 0.000 000 000 001 149 937 228 185 6;
  • 28) 0.000 000 000 001 149 937 228 185 6 × 2 = 0 + 0.000 000 000 002 299 874 456 371 2;
  • 29) 0.000 000 000 002 299 874 456 371 2 × 2 = 0 + 0.000 000 000 004 599 748 912 742 4;
  • 30) 0.000 000 000 004 599 748 912 742 4 × 2 = 0 + 0.000 000 000 009 199 497 825 484 8;
  • 31) 0.000 000 000 009 199 497 825 484 8 × 2 = 0 + 0.000 000 000 018 398 995 650 969 6;
  • 32) 0.000 000 000 018 398 995 650 969 6 × 2 = 0 + 0.000 000 000 036 797 991 301 939 2;
  • 33) 0.000 000 000 036 797 991 301 939 2 × 2 = 0 + 0.000 000 000 073 595 982 603 878 4;
  • 34) 0.000 000 000 073 595 982 603 878 4 × 2 = 0 + 0.000 000 000 147 191 965 207 756 8;
  • 35) 0.000 000 000 147 191 965 207 756 8 × 2 = 0 + 0.000 000 000 294 383 930 415 513 6;
  • 36) 0.000 000 000 294 383 930 415 513 6 × 2 = 0 + 0.000 000 000 588 767 860 831 027 2;
  • 37) 0.000 000 000 588 767 860 831 027 2 × 2 = 0 + 0.000 000 001 177 535 721 662 054 4;
  • 38) 0.000 000 001 177 535 721 662 054 4 × 2 = 0 + 0.000 000 002 355 071 443 324 108 8;
  • 39) 0.000 000 002 355 071 443 324 108 8 × 2 = 0 + 0.000 000 004 710 142 886 648 217 6;
  • 40) 0.000 000 004 710 142 886 648 217 6 × 2 = 0 + 0.000 000 009 420 285 773 296 435 2;
  • 41) 0.000 000 009 420 285 773 296 435 2 × 2 = 0 + 0.000 000 018 840 571 546 592 870 4;
  • 42) 0.000 000 018 840 571 546 592 870 4 × 2 = 0 + 0.000 000 037 681 143 093 185 740 8;
  • 43) 0.000 000 037 681 143 093 185 740 8 × 2 = 0 + 0.000 000 075 362 286 186 371 481 6;
  • 44) 0.000 000 075 362 286 186 371 481 6 × 2 = 0 + 0.000 000 150 724 572 372 742 963 2;
  • 45) 0.000 000 150 724 572 372 742 963 2 × 2 = 0 + 0.000 000 301 449 144 745 485 926 4;
  • 46) 0.000 000 301 449 144 745 485 926 4 × 2 = 0 + 0.000 000 602 898 289 490 971 852 8;
  • 47) 0.000 000 602 898 289 490 971 852 8 × 2 = 0 + 0.000 001 205 796 578 981 943 705 6;
  • 48) 0.000 001 205 796 578 981 943 705 6 × 2 = 0 + 0.000 002 411 593 157 963 887 411 2;
  • 49) 0.000 002 411 593 157 963 887 411 2 × 2 = 0 + 0.000 004 823 186 315 927 774 822 4;
  • 50) 0.000 004 823 186 315 927 774 822 4 × 2 = 0 + 0.000 009 646 372 631 855 549 644 8;
  • 51) 0.000 009 646 372 631 855 549 644 8 × 2 = 0 + 0.000 019 292 745 263 711 099 289 6;
  • 52) 0.000 019 292 745 263 711 099 289 6 × 2 = 0 + 0.000 038 585 490 527 422 198 579 2;
  • 53) 0.000 038 585 490 527 422 198 579 2 × 2 = 0 + 0.000 077 170 981 054 844 397 158 4;
  • 54) 0.000 077 170 981 054 844 397 158 4 × 2 = 0 + 0.000 154 341 962 109 688 794 316 8;
  • 55) 0.000 154 341 962 109 688 794 316 8 × 2 = 0 + 0.000 308 683 924 219 377 588 633 6;
  • 56) 0.000 308 683 924 219 377 588 633 6 × 2 = 0 + 0.000 617 367 848 438 755 177 267 2;
  • 57) 0.000 617 367 848 438 755 177 267 2 × 2 = 0 + 0.001 234 735 696 877 510 354 534 4;
  • 58) 0.001 234 735 696 877 510 354 534 4 × 2 = 0 + 0.002 469 471 393 755 020 709 068 8;
  • 59) 0.002 469 471 393 755 020 709 068 8 × 2 = 0 + 0.004 938 942 787 510 041 418 137 6;
  • 60) 0.004 938 942 787 510 041 418 137 6 × 2 = 0 + 0.009 877 885 575 020 082 836 275 2;
  • 61) 0.009 877 885 575 020 082 836 275 2 × 2 = 0 + 0.019 755 771 150 040 165 672 550 4;
  • 62) 0.019 755 771 150 040 165 672 550 4 × 2 = 0 + 0.039 511 542 300 080 331 345 100 8;
  • 63) 0.039 511 542 300 080 331 345 100 8 × 2 = 0 + 0.079 023 084 600 160 662 690 201 6;
  • 64) 0.079 023 084 600 160 662 690 201 6 × 2 = 0 + 0.158 046 169 200 321 325 380 403 2;
  • 65) 0.158 046 169 200 321 325 380 403 2 × 2 = 0 + 0.316 092 338 400 642 650 760 806 4;
  • 66) 0.316 092 338 400 642 650 760 806 4 × 2 = 0 + 0.632 184 676 801 285 301 521 612 8;
  • 67) 0.632 184 676 801 285 301 521 612 8 × 2 = 1 + 0.264 369 353 602 570 603 043 225 6;
  • 68) 0.264 369 353 602 570 603 043 225 6 × 2 = 0 + 0.528 738 707 205 141 206 086 451 2;
  • 69) 0.528 738 707 205 141 206 086 451 2 × 2 = 1 + 0.057 477 414 410 282 412 172 902 4;
  • 70) 0.057 477 414 410 282 412 172 902 4 × 2 = 0 + 0.114 954 828 820 564 824 345 804 8;
  • 71) 0.114 954 828 820 564 824 345 804 8 × 2 = 0 + 0.229 909 657 641 129 648 691 609 6;
  • 72) 0.229 909 657 641 129 648 691 609 6 × 2 = 0 + 0.459 819 315 282 259 297 383 219 2;
  • 73) 0.459 819 315 282 259 297 383 219 2 × 2 = 0 + 0.919 638 630 564 518 594 766 438 4;
  • 74) 0.919 638 630 564 518 594 766 438 4 × 2 = 1 + 0.839 277 261 129 037 189 532 876 8;
  • 75) 0.839 277 261 129 037 189 532 876 8 × 2 = 1 + 0.678 554 522 258 074 379 065 753 6;
  • 76) 0.678 554 522 258 074 379 065 753 6 × 2 = 1 + 0.357 109 044 516 148 758 131 507 2;
  • 77) 0.357 109 044 516 148 758 131 507 2 × 2 = 0 + 0.714 218 089 032 297 516 263 014 4;
  • 78) 0.714 218 089 032 297 516 263 014 4 × 2 = 1 + 0.428 436 178 064 595 032 526 028 8;
  • 79) 0.428 436 178 064 595 032 526 028 8 × 2 = 0 + 0.856 872 356 129 190 065 052 057 6;
  • 80) 0.856 872 356 129 190 065 052 057 6 × 2 = 1 + 0.713 744 712 258 380 130 104 115 2;
  • 81) 0.713 744 712 258 380 130 104 115 2 × 2 = 1 + 0.427 489 424 516 760 260 208 230 4;
  • 82) 0.427 489 424 516 760 260 208 230 4 × 2 = 0 + 0.854 978 849 033 520 520 416 460 8;
  • 83) 0.854 978 849 033 520 520 416 460 8 × 2 = 1 + 0.709 957 698 067 041 040 832 921 6;
  • 84) 0.709 957 698 067 041 040 832 921 6 × 2 = 1 + 0.419 915 396 134 082 081 665 843 2;
  • 85) 0.419 915 396 134 082 081 665 843 2 × 2 = 0 + 0.839 830 792 268 164 163 331 686 4;
  • 86) 0.839 830 792 268 164 163 331 686 4 × 2 = 1 + 0.679 661 584 536 328 326 663 372 8;
  • 87) 0.679 661 584 536 328 326 663 372 8 × 2 = 1 + 0.359 323 169 072 656 653 326 745 6;
  • 88) 0.359 323 169 072 656 653 326 745 6 × 2 = 0 + 0.718 646 338 145 313 306 653 491 2;
  • 89) 0.718 646 338 145 313 306 653 491 2 × 2 = 1 + 0.437 292 676 290 626 613 306 982 4;
  • 90) 0.437 292 676 290 626 613 306 982 4 × 2 = 0 + 0.874 585 352 581 253 226 613 964 8;
  • 91) 0.874 585 352 581 253 226 613 964 8 × 2 = 1 + 0.749 170 705 162 506 453 227 929 6;
  • 92) 0.749 170 705 162 506 453 227 929 6 × 2 = 1 + 0.498 341 410 325 012 906 455 859 2;
  • 93) 0.498 341 410 325 012 906 455 859 2 × 2 = 0 + 0.996 682 820 650 025 812 911 718 4;
  • 94) 0.996 682 820 650 025 812 911 718 4 × 2 = 1 + 0.993 365 641 300 051 625 823 436 8;
  • 95) 0.993 365 641 300 051 625 823 436 8 × 2 = 1 + 0.986 731 282 600 103 251 646 873 6;
  • 96) 0.986 731 282 600 103 251 646 873 6 × 2 = 1 + 0.973 462 565 200 206 503 293 747 2;
  • 97) 0.973 462 565 200 206 503 293 747 2 × 2 = 1 + 0.946 925 130 400 413 006 587 494 4;
  • 98) 0.946 925 130 400 413 006 587 494 4 × 2 = 1 + 0.893 850 260 800 826 013 174 988 8;
  • 99) 0.893 850 260 800 826 013 174 988 8 × 2 = 1 + 0.787 700 521 601 652 026 349 977 6;
  • 100) 0.787 700 521 601 652 026 349 977 6 × 2 = 1 + 0.575 401 043 203 304 052 699 955 2;
  • 101) 0.575 401 043 203 304 052 699 955 2 × 2 = 1 + 0.150 802 086 406 608 105 399 910 4;
  • 102) 0.150 802 086 406 608 105 399 910 4 × 2 = 0 + 0.301 604 172 813 216 210 799 820 8;
  • 103) 0.301 604 172 813 216 210 799 820 8 × 2 = 0 + 0.603 208 345 626 432 421 599 641 6;
  • 104) 0.603 208 345 626 432 421 599 641 6 × 2 = 1 + 0.206 416 691 252 864 843 199 283 2;
  • 105) 0.206 416 691 252 864 843 199 283 2 × 2 = 0 + 0.412 833 382 505 729 686 398 566 4;
  • 106) 0.412 833 382 505 729 686 398 566 4 × 2 = 0 + 0.825 666 765 011 459 372 797 132 8;
  • 107) 0.825 666 765 011 459 372 797 132 8 × 2 = 1 + 0.651 333 530 022 918 745 594 265 6;
  • 108) 0.651 333 530 022 918 745 594 265 6 × 2 = 1 + 0.302 667 060 045 837 491 188 531 2;
  • 109) 0.302 667 060 045 837 491 188 531 2 × 2 = 0 + 0.605 334 120 091 674 982 377 062 4;
  • 110) 0.605 334 120 091 674 982 377 062 4 × 2 = 1 + 0.210 668 240 183 349 964 754 124 8;
  • 111) 0.210 668 240 183 349 964 754 124 8 × 2 = 0 + 0.421 336 480 366 699 929 508 249 6;
  • 112) 0.421 336 480 366 699 929 508 249 6 × 2 = 0 + 0.842 672 960 733 399 859 016 499 2;
  • 113) 0.842 672 960 733 399 859 016 499 2 × 2 = 1 + 0.685 345 921 466 799 718 032 998 4;
  • 114) 0.685 345 921 466 799 718 032 998 4 × 2 = 1 + 0.370 691 842 933 599 436 065 996 8;
  • 115) 0.370 691 842 933 599 436 065 996 8 × 2 = 0 + 0.741 383 685 867 198 872 131 993 6;
  • 116) 0.741 383 685 867 198 872 131 993 6 × 2 = 1 + 0.482 767 371 734 397 744 263 987 2;
  • 117) 0.482 767 371 734 397 744 263 987 2 × 2 = 0 + 0.965 534 743 468 795 488 527 974 4;
  • 118) 0.965 534 743 468 795 488 527 974 4 × 2 = 1 + 0.931 069 486 937 590 977 055 948 8;
  • 119) 0.931 069 486 937 590 977 055 948 8 × 2 = 1 + 0.862 138 973 875 181 954 111 897 6;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 567 7(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0111 0101 1011 0110 1011 0111 1111 1001 0011 0100 1101 011(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 567 7(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0111 0101 1011 0110 1011 0111 1111 1001 0011 0100 1101 011(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 567 7(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0111 0101 1011 0110 1011 0111 1111 1001 0011 0100 1101 011(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0111 0101 1011 0110 1011 0111 1111 1001 0011 0100 1101 011(2) × 20 =


1.0100 0011 1010 1101 1011 0101 1011 1111 1100 1001 1010 0110 1011(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0011 1010 1101 1011 0101 1011 1111 1100 1001 1010 0110 1011


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0011 1010 1101 1011 0101 1011 1111 1100 1001 1010 0110 1011 =


0100 0011 1010 1101 1011 0101 1011 1111 1100 1001 1010 0110 1011


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0011 1010 1101 1011 0101 1011 1111 1100 1001 1010 0110 1011


Decimal number 0.000 000 000 000 000 000 008 567 7 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0011 1010 1101 1011 0101 1011 1111 1100 1001 1010 0110 1011


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100