0.000 000 000 000 000 000 008 568 4 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 568 4(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 568 4(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 568 4.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 568 4 × 2 = 0 + 0.000 000 000 000 000 000 017 136 8;
  • 2) 0.000 000 000 000 000 000 017 136 8 × 2 = 0 + 0.000 000 000 000 000 000 034 273 6;
  • 3) 0.000 000 000 000 000 000 034 273 6 × 2 = 0 + 0.000 000 000 000 000 000 068 547 2;
  • 4) 0.000 000 000 000 000 000 068 547 2 × 2 = 0 + 0.000 000 000 000 000 000 137 094 4;
  • 5) 0.000 000 000 000 000 000 137 094 4 × 2 = 0 + 0.000 000 000 000 000 000 274 188 8;
  • 6) 0.000 000 000 000 000 000 274 188 8 × 2 = 0 + 0.000 000 000 000 000 000 548 377 6;
  • 7) 0.000 000 000 000 000 000 548 377 6 × 2 = 0 + 0.000 000 000 000 000 001 096 755 2;
  • 8) 0.000 000 000 000 000 001 096 755 2 × 2 = 0 + 0.000 000 000 000 000 002 193 510 4;
  • 9) 0.000 000 000 000 000 002 193 510 4 × 2 = 0 + 0.000 000 000 000 000 004 387 020 8;
  • 10) 0.000 000 000 000 000 004 387 020 8 × 2 = 0 + 0.000 000 000 000 000 008 774 041 6;
  • 11) 0.000 000 000 000 000 008 774 041 6 × 2 = 0 + 0.000 000 000 000 000 017 548 083 2;
  • 12) 0.000 000 000 000 000 017 548 083 2 × 2 = 0 + 0.000 000 000 000 000 035 096 166 4;
  • 13) 0.000 000 000 000 000 035 096 166 4 × 2 = 0 + 0.000 000 000 000 000 070 192 332 8;
  • 14) 0.000 000 000 000 000 070 192 332 8 × 2 = 0 + 0.000 000 000 000 000 140 384 665 6;
  • 15) 0.000 000 000 000 000 140 384 665 6 × 2 = 0 + 0.000 000 000 000 000 280 769 331 2;
  • 16) 0.000 000 000 000 000 280 769 331 2 × 2 = 0 + 0.000 000 000 000 000 561 538 662 4;
  • 17) 0.000 000 000 000 000 561 538 662 4 × 2 = 0 + 0.000 000 000 000 001 123 077 324 8;
  • 18) 0.000 000 000 000 001 123 077 324 8 × 2 = 0 + 0.000 000 000 000 002 246 154 649 6;
  • 19) 0.000 000 000 000 002 246 154 649 6 × 2 = 0 + 0.000 000 000 000 004 492 309 299 2;
  • 20) 0.000 000 000 000 004 492 309 299 2 × 2 = 0 + 0.000 000 000 000 008 984 618 598 4;
  • 21) 0.000 000 000 000 008 984 618 598 4 × 2 = 0 + 0.000 000 000 000 017 969 237 196 8;
  • 22) 0.000 000 000 000 017 969 237 196 8 × 2 = 0 + 0.000 000 000 000 035 938 474 393 6;
  • 23) 0.000 000 000 000 035 938 474 393 6 × 2 = 0 + 0.000 000 000 000 071 876 948 787 2;
  • 24) 0.000 000 000 000 071 876 948 787 2 × 2 = 0 + 0.000 000 000 000 143 753 897 574 4;
  • 25) 0.000 000 000 000 143 753 897 574 4 × 2 = 0 + 0.000 000 000 000 287 507 795 148 8;
  • 26) 0.000 000 000 000 287 507 795 148 8 × 2 = 0 + 0.000 000 000 000 575 015 590 297 6;
  • 27) 0.000 000 000 000 575 015 590 297 6 × 2 = 0 + 0.000 000 000 001 150 031 180 595 2;
  • 28) 0.000 000 000 001 150 031 180 595 2 × 2 = 0 + 0.000 000 000 002 300 062 361 190 4;
  • 29) 0.000 000 000 002 300 062 361 190 4 × 2 = 0 + 0.000 000 000 004 600 124 722 380 8;
  • 30) 0.000 000 000 004 600 124 722 380 8 × 2 = 0 + 0.000 000 000 009 200 249 444 761 6;
  • 31) 0.000 000 000 009 200 249 444 761 6 × 2 = 0 + 0.000 000 000 018 400 498 889 523 2;
  • 32) 0.000 000 000 018 400 498 889 523 2 × 2 = 0 + 0.000 000 000 036 800 997 779 046 4;
  • 33) 0.000 000 000 036 800 997 779 046 4 × 2 = 0 + 0.000 000 000 073 601 995 558 092 8;
  • 34) 0.000 000 000 073 601 995 558 092 8 × 2 = 0 + 0.000 000 000 147 203 991 116 185 6;
  • 35) 0.000 000 000 147 203 991 116 185 6 × 2 = 0 + 0.000 000 000 294 407 982 232 371 2;
  • 36) 0.000 000 000 294 407 982 232 371 2 × 2 = 0 + 0.000 000 000 588 815 964 464 742 4;
  • 37) 0.000 000 000 588 815 964 464 742 4 × 2 = 0 + 0.000 000 001 177 631 928 929 484 8;
  • 38) 0.000 000 001 177 631 928 929 484 8 × 2 = 0 + 0.000 000 002 355 263 857 858 969 6;
  • 39) 0.000 000 002 355 263 857 858 969 6 × 2 = 0 + 0.000 000 004 710 527 715 717 939 2;
  • 40) 0.000 000 004 710 527 715 717 939 2 × 2 = 0 + 0.000 000 009 421 055 431 435 878 4;
  • 41) 0.000 000 009 421 055 431 435 878 4 × 2 = 0 + 0.000 000 018 842 110 862 871 756 8;
  • 42) 0.000 000 018 842 110 862 871 756 8 × 2 = 0 + 0.000 000 037 684 221 725 743 513 6;
  • 43) 0.000 000 037 684 221 725 743 513 6 × 2 = 0 + 0.000 000 075 368 443 451 487 027 2;
  • 44) 0.000 000 075 368 443 451 487 027 2 × 2 = 0 + 0.000 000 150 736 886 902 974 054 4;
  • 45) 0.000 000 150 736 886 902 974 054 4 × 2 = 0 + 0.000 000 301 473 773 805 948 108 8;
  • 46) 0.000 000 301 473 773 805 948 108 8 × 2 = 0 + 0.000 000 602 947 547 611 896 217 6;
  • 47) 0.000 000 602 947 547 611 896 217 6 × 2 = 0 + 0.000 001 205 895 095 223 792 435 2;
  • 48) 0.000 001 205 895 095 223 792 435 2 × 2 = 0 + 0.000 002 411 790 190 447 584 870 4;
  • 49) 0.000 002 411 790 190 447 584 870 4 × 2 = 0 + 0.000 004 823 580 380 895 169 740 8;
  • 50) 0.000 004 823 580 380 895 169 740 8 × 2 = 0 + 0.000 009 647 160 761 790 339 481 6;
  • 51) 0.000 009 647 160 761 790 339 481 6 × 2 = 0 + 0.000 019 294 321 523 580 678 963 2;
  • 52) 0.000 019 294 321 523 580 678 963 2 × 2 = 0 + 0.000 038 588 643 047 161 357 926 4;
  • 53) 0.000 038 588 643 047 161 357 926 4 × 2 = 0 + 0.000 077 177 286 094 322 715 852 8;
  • 54) 0.000 077 177 286 094 322 715 852 8 × 2 = 0 + 0.000 154 354 572 188 645 431 705 6;
  • 55) 0.000 154 354 572 188 645 431 705 6 × 2 = 0 + 0.000 308 709 144 377 290 863 411 2;
  • 56) 0.000 308 709 144 377 290 863 411 2 × 2 = 0 + 0.000 617 418 288 754 581 726 822 4;
  • 57) 0.000 617 418 288 754 581 726 822 4 × 2 = 0 + 0.001 234 836 577 509 163 453 644 8;
  • 58) 0.001 234 836 577 509 163 453 644 8 × 2 = 0 + 0.002 469 673 155 018 326 907 289 6;
  • 59) 0.002 469 673 155 018 326 907 289 6 × 2 = 0 + 0.004 939 346 310 036 653 814 579 2;
  • 60) 0.004 939 346 310 036 653 814 579 2 × 2 = 0 + 0.009 878 692 620 073 307 629 158 4;
  • 61) 0.009 878 692 620 073 307 629 158 4 × 2 = 0 + 0.019 757 385 240 146 615 258 316 8;
  • 62) 0.019 757 385 240 146 615 258 316 8 × 2 = 0 + 0.039 514 770 480 293 230 516 633 6;
  • 63) 0.039 514 770 480 293 230 516 633 6 × 2 = 0 + 0.079 029 540 960 586 461 033 267 2;
  • 64) 0.079 029 540 960 586 461 033 267 2 × 2 = 0 + 0.158 059 081 921 172 922 066 534 4;
  • 65) 0.158 059 081 921 172 922 066 534 4 × 2 = 0 + 0.316 118 163 842 345 844 133 068 8;
  • 66) 0.316 118 163 842 345 844 133 068 8 × 2 = 0 + 0.632 236 327 684 691 688 266 137 6;
  • 67) 0.632 236 327 684 691 688 266 137 6 × 2 = 1 + 0.264 472 655 369 383 376 532 275 2;
  • 68) 0.264 472 655 369 383 376 532 275 2 × 2 = 0 + 0.528 945 310 738 766 753 064 550 4;
  • 69) 0.528 945 310 738 766 753 064 550 4 × 2 = 1 + 0.057 890 621 477 533 506 129 100 8;
  • 70) 0.057 890 621 477 533 506 129 100 8 × 2 = 0 + 0.115 781 242 955 067 012 258 201 6;
  • 71) 0.115 781 242 955 067 012 258 201 6 × 2 = 0 + 0.231 562 485 910 134 024 516 403 2;
  • 72) 0.231 562 485 910 134 024 516 403 2 × 2 = 0 + 0.463 124 971 820 268 049 032 806 4;
  • 73) 0.463 124 971 820 268 049 032 806 4 × 2 = 0 + 0.926 249 943 640 536 098 065 612 8;
  • 74) 0.926 249 943 640 536 098 065 612 8 × 2 = 1 + 0.852 499 887 281 072 196 131 225 6;
  • 75) 0.852 499 887 281 072 196 131 225 6 × 2 = 1 + 0.704 999 774 562 144 392 262 451 2;
  • 76) 0.704 999 774 562 144 392 262 451 2 × 2 = 1 + 0.409 999 549 124 288 784 524 902 4;
  • 77) 0.409 999 549 124 288 784 524 902 4 × 2 = 0 + 0.819 999 098 248 577 569 049 804 8;
  • 78) 0.819 999 098 248 577 569 049 804 8 × 2 = 1 + 0.639 998 196 497 155 138 099 609 6;
  • 79) 0.639 998 196 497 155 138 099 609 6 × 2 = 1 + 0.279 996 392 994 310 276 199 219 2;
  • 80) 0.279 996 392 994 310 276 199 219 2 × 2 = 0 + 0.559 992 785 988 620 552 398 438 4;
  • 81) 0.559 992 785 988 620 552 398 438 4 × 2 = 1 + 0.119 985 571 977 241 104 796 876 8;
  • 82) 0.119 985 571 977 241 104 796 876 8 × 2 = 0 + 0.239 971 143 954 482 209 593 753 6;
  • 83) 0.239 971 143 954 482 209 593 753 6 × 2 = 0 + 0.479 942 287 908 964 419 187 507 2;
  • 84) 0.479 942 287 908 964 419 187 507 2 × 2 = 0 + 0.959 884 575 817 928 838 375 014 4;
  • 85) 0.959 884 575 817 928 838 375 014 4 × 2 = 1 + 0.919 769 151 635 857 676 750 028 8;
  • 86) 0.919 769 151 635 857 676 750 028 8 × 2 = 1 + 0.839 538 303 271 715 353 500 057 6;
  • 87) 0.839 538 303 271 715 353 500 057 6 × 2 = 1 + 0.679 076 606 543 430 707 000 115 2;
  • 88) 0.679 076 606 543 430 707 000 115 2 × 2 = 1 + 0.358 153 213 086 861 414 000 230 4;
  • 89) 0.358 153 213 086 861 414 000 230 4 × 2 = 0 + 0.716 306 426 173 722 828 000 460 8;
  • 90) 0.716 306 426 173 722 828 000 460 8 × 2 = 1 + 0.432 612 852 347 445 656 000 921 6;
  • 91) 0.432 612 852 347 445 656 000 921 6 × 2 = 0 + 0.865 225 704 694 891 312 001 843 2;
  • 92) 0.865 225 704 694 891 312 001 843 2 × 2 = 1 + 0.730 451 409 389 782 624 003 686 4;
  • 93) 0.730 451 409 389 782 624 003 686 4 × 2 = 1 + 0.460 902 818 779 565 248 007 372 8;
  • 94) 0.460 902 818 779 565 248 007 372 8 × 2 = 0 + 0.921 805 637 559 130 496 014 745 6;
  • 95) 0.921 805 637 559 130 496 014 745 6 × 2 = 1 + 0.843 611 275 118 260 992 029 491 2;
  • 96) 0.843 611 275 118 260 992 029 491 2 × 2 = 1 + 0.687 222 550 236 521 984 058 982 4;
  • 97) 0.687 222 550 236 521 984 058 982 4 × 2 = 1 + 0.374 445 100 473 043 968 117 964 8;
  • 98) 0.374 445 100 473 043 968 117 964 8 × 2 = 0 + 0.748 890 200 946 087 936 235 929 6;
  • 99) 0.748 890 200 946 087 936 235 929 6 × 2 = 1 + 0.497 780 401 892 175 872 471 859 2;
  • 100) 0.497 780 401 892 175 872 471 859 2 × 2 = 0 + 0.995 560 803 784 351 744 943 718 4;
  • 101) 0.995 560 803 784 351 744 943 718 4 × 2 = 1 + 0.991 121 607 568 703 489 887 436 8;
  • 102) 0.991 121 607 568 703 489 887 436 8 × 2 = 1 + 0.982 243 215 137 406 979 774 873 6;
  • 103) 0.982 243 215 137 406 979 774 873 6 × 2 = 1 + 0.964 486 430 274 813 959 549 747 2;
  • 104) 0.964 486 430 274 813 959 549 747 2 × 2 = 1 + 0.928 972 860 549 627 919 099 494 4;
  • 105) 0.928 972 860 549 627 919 099 494 4 × 2 = 1 + 0.857 945 721 099 255 838 198 988 8;
  • 106) 0.857 945 721 099 255 838 198 988 8 × 2 = 1 + 0.715 891 442 198 511 676 397 977 6;
  • 107) 0.715 891 442 198 511 676 397 977 6 × 2 = 1 + 0.431 782 884 397 023 352 795 955 2;
  • 108) 0.431 782 884 397 023 352 795 955 2 × 2 = 0 + 0.863 565 768 794 046 705 591 910 4;
  • 109) 0.863 565 768 794 046 705 591 910 4 × 2 = 1 + 0.727 131 537 588 093 411 183 820 8;
  • 110) 0.727 131 537 588 093 411 183 820 8 × 2 = 1 + 0.454 263 075 176 186 822 367 641 6;
  • 111) 0.454 263 075 176 186 822 367 641 6 × 2 = 0 + 0.908 526 150 352 373 644 735 283 2;
  • 112) 0.908 526 150 352 373 644 735 283 2 × 2 = 1 + 0.817 052 300 704 747 289 470 566 4;
  • 113) 0.817 052 300 704 747 289 470 566 4 × 2 = 1 + 0.634 104 601 409 494 578 941 132 8;
  • 114) 0.634 104 601 409 494 578 941 132 8 × 2 = 1 + 0.268 209 202 818 989 157 882 265 6;
  • 115) 0.268 209 202 818 989 157 882 265 6 × 2 = 0 + 0.536 418 405 637 978 315 764 531 2;
  • 116) 0.536 418 405 637 978 315 764 531 2 × 2 = 1 + 0.072 836 811 275 956 631 529 062 4;
  • 117) 0.072 836 811 275 956 631 529 062 4 × 2 = 0 + 0.145 673 622 551 913 263 058 124 8;
  • 118) 0.145 673 622 551 913 263 058 124 8 × 2 = 0 + 0.291 347 245 103 826 526 116 249 6;
  • 119) 0.291 347 245 103 826 526 116 249 6 × 2 = 0 + 0.582 694 490 207 653 052 232 499 2;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 568 4(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0111 0110 1000 1111 0101 1011 1010 1111 1110 1101 1101 000(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 568 4(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0111 0110 1000 1111 0101 1011 1010 1111 1110 1101 1101 000(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 568 4(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0111 0110 1000 1111 0101 1011 1010 1111 1110 1101 1101 000(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0111 0110 1000 1111 0101 1011 1010 1111 1110 1101 1101 000(2) × 20 =


1.0100 0011 1011 0100 0111 1010 1101 1101 0111 1111 0110 1110 1000(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0011 1011 0100 0111 1010 1101 1101 0111 1111 0110 1110 1000


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0011 1011 0100 0111 1010 1101 1101 0111 1111 0110 1110 1000 =


0100 0011 1011 0100 0111 1010 1101 1101 0111 1111 0110 1110 1000


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0011 1011 0100 0111 1010 1101 1101 0111 1111 0110 1110 1000


Decimal number 0.000 000 000 000 000 000 008 568 4 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0011 1011 0100 0111 1010 1101 1101 0111 1111 0110 1110 1000


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100