0.000 000 000 000 000 000 008 63 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 63(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 63(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 63.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 63 × 2 = 0 + 0.000 000 000 000 000 000 017 26;
  • 2) 0.000 000 000 000 000 000 017 26 × 2 = 0 + 0.000 000 000 000 000 000 034 52;
  • 3) 0.000 000 000 000 000 000 034 52 × 2 = 0 + 0.000 000 000 000 000 000 069 04;
  • 4) 0.000 000 000 000 000 000 069 04 × 2 = 0 + 0.000 000 000 000 000 000 138 08;
  • 5) 0.000 000 000 000 000 000 138 08 × 2 = 0 + 0.000 000 000 000 000 000 276 16;
  • 6) 0.000 000 000 000 000 000 276 16 × 2 = 0 + 0.000 000 000 000 000 000 552 32;
  • 7) 0.000 000 000 000 000 000 552 32 × 2 = 0 + 0.000 000 000 000 000 001 104 64;
  • 8) 0.000 000 000 000 000 001 104 64 × 2 = 0 + 0.000 000 000 000 000 002 209 28;
  • 9) 0.000 000 000 000 000 002 209 28 × 2 = 0 + 0.000 000 000 000 000 004 418 56;
  • 10) 0.000 000 000 000 000 004 418 56 × 2 = 0 + 0.000 000 000 000 000 008 837 12;
  • 11) 0.000 000 000 000 000 008 837 12 × 2 = 0 + 0.000 000 000 000 000 017 674 24;
  • 12) 0.000 000 000 000 000 017 674 24 × 2 = 0 + 0.000 000 000 000 000 035 348 48;
  • 13) 0.000 000 000 000 000 035 348 48 × 2 = 0 + 0.000 000 000 000 000 070 696 96;
  • 14) 0.000 000 000 000 000 070 696 96 × 2 = 0 + 0.000 000 000 000 000 141 393 92;
  • 15) 0.000 000 000 000 000 141 393 92 × 2 = 0 + 0.000 000 000 000 000 282 787 84;
  • 16) 0.000 000 000 000 000 282 787 84 × 2 = 0 + 0.000 000 000 000 000 565 575 68;
  • 17) 0.000 000 000 000 000 565 575 68 × 2 = 0 + 0.000 000 000 000 001 131 151 36;
  • 18) 0.000 000 000 000 001 131 151 36 × 2 = 0 + 0.000 000 000 000 002 262 302 72;
  • 19) 0.000 000 000 000 002 262 302 72 × 2 = 0 + 0.000 000 000 000 004 524 605 44;
  • 20) 0.000 000 000 000 004 524 605 44 × 2 = 0 + 0.000 000 000 000 009 049 210 88;
  • 21) 0.000 000 000 000 009 049 210 88 × 2 = 0 + 0.000 000 000 000 018 098 421 76;
  • 22) 0.000 000 000 000 018 098 421 76 × 2 = 0 + 0.000 000 000 000 036 196 843 52;
  • 23) 0.000 000 000 000 036 196 843 52 × 2 = 0 + 0.000 000 000 000 072 393 687 04;
  • 24) 0.000 000 000 000 072 393 687 04 × 2 = 0 + 0.000 000 000 000 144 787 374 08;
  • 25) 0.000 000 000 000 144 787 374 08 × 2 = 0 + 0.000 000 000 000 289 574 748 16;
  • 26) 0.000 000 000 000 289 574 748 16 × 2 = 0 + 0.000 000 000 000 579 149 496 32;
  • 27) 0.000 000 000 000 579 149 496 32 × 2 = 0 + 0.000 000 000 001 158 298 992 64;
  • 28) 0.000 000 000 001 158 298 992 64 × 2 = 0 + 0.000 000 000 002 316 597 985 28;
  • 29) 0.000 000 000 002 316 597 985 28 × 2 = 0 + 0.000 000 000 004 633 195 970 56;
  • 30) 0.000 000 000 004 633 195 970 56 × 2 = 0 + 0.000 000 000 009 266 391 941 12;
  • 31) 0.000 000 000 009 266 391 941 12 × 2 = 0 + 0.000 000 000 018 532 783 882 24;
  • 32) 0.000 000 000 018 532 783 882 24 × 2 = 0 + 0.000 000 000 037 065 567 764 48;
  • 33) 0.000 000 000 037 065 567 764 48 × 2 = 0 + 0.000 000 000 074 131 135 528 96;
  • 34) 0.000 000 000 074 131 135 528 96 × 2 = 0 + 0.000 000 000 148 262 271 057 92;
  • 35) 0.000 000 000 148 262 271 057 92 × 2 = 0 + 0.000 000 000 296 524 542 115 84;
  • 36) 0.000 000 000 296 524 542 115 84 × 2 = 0 + 0.000 000 000 593 049 084 231 68;
  • 37) 0.000 000 000 593 049 084 231 68 × 2 = 0 + 0.000 000 001 186 098 168 463 36;
  • 38) 0.000 000 001 186 098 168 463 36 × 2 = 0 + 0.000 000 002 372 196 336 926 72;
  • 39) 0.000 000 002 372 196 336 926 72 × 2 = 0 + 0.000 000 004 744 392 673 853 44;
  • 40) 0.000 000 004 744 392 673 853 44 × 2 = 0 + 0.000 000 009 488 785 347 706 88;
  • 41) 0.000 000 009 488 785 347 706 88 × 2 = 0 + 0.000 000 018 977 570 695 413 76;
  • 42) 0.000 000 018 977 570 695 413 76 × 2 = 0 + 0.000 000 037 955 141 390 827 52;
  • 43) 0.000 000 037 955 141 390 827 52 × 2 = 0 + 0.000 000 075 910 282 781 655 04;
  • 44) 0.000 000 075 910 282 781 655 04 × 2 = 0 + 0.000 000 151 820 565 563 310 08;
  • 45) 0.000 000 151 820 565 563 310 08 × 2 = 0 + 0.000 000 303 641 131 126 620 16;
  • 46) 0.000 000 303 641 131 126 620 16 × 2 = 0 + 0.000 000 607 282 262 253 240 32;
  • 47) 0.000 000 607 282 262 253 240 32 × 2 = 0 + 0.000 001 214 564 524 506 480 64;
  • 48) 0.000 001 214 564 524 506 480 64 × 2 = 0 + 0.000 002 429 129 049 012 961 28;
  • 49) 0.000 002 429 129 049 012 961 28 × 2 = 0 + 0.000 004 858 258 098 025 922 56;
  • 50) 0.000 004 858 258 098 025 922 56 × 2 = 0 + 0.000 009 716 516 196 051 845 12;
  • 51) 0.000 009 716 516 196 051 845 12 × 2 = 0 + 0.000 019 433 032 392 103 690 24;
  • 52) 0.000 019 433 032 392 103 690 24 × 2 = 0 + 0.000 038 866 064 784 207 380 48;
  • 53) 0.000 038 866 064 784 207 380 48 × 2 = 0 + 0.000 077 732 129 568 414 760 96;
  • 54) 0.000 077 732 129 568 414 760 96 × 2 = 0 + 0.000 155 464 259 136 829 521 92;
  • 55) 0.000 155 464 259 136 829 521 92 × 2 = 0 + 0.000 310 928 518 273 659 043 84;
  • 56) 0.000 310 928 518 273 659 043 84 × 2 = 0 + 0.000 621 857 036 547 318 087 68;
  • 57) 0.000 621 857 036 547 318 087 68 × 2 = 0 + 0.001 243 714 073 094 636 175 36;
  • 58) 0.001 243 714 073 094 636 175 36 × 2 = 0 + 0.002 487 428 146 189 272 350 72;
  • 59) 0.002 487 428 146 189 272 350 72 × 2 = 0 + 0.004 974 856 292 378 544 701 44;
  • 60) 0.004 974 856 292 378 544 701 44 × 2 = 0 + 0.009 949 712 584 757 089 402 88;
  • 61) 0.009 949 712 584 757 089 402 88 × 2 = 0 + 0.019 899 425 169 514 178 805 76;
  • 62) 0.019 899 425 169 514 178 805 76 × 2 = 0 + 0.039 798 850 339 028 357 611 52;
  • 63) 0.039 798 850 339 028 357 611 52 × 2 = 0 + 0.079 597 700 678 056 715 223 04;
  • 64) 0.079 597 700 678 056 715 223 04 × 2 = 0 + 0.159 195 401 356 113 430 446 08;
  • 65) 0.159 195 401 356 113 430 446 08 × 2 = 0 + 0.318 390 802 712 226 860 892 16;
  • 66) 0.318 390 802 712 226 860 892 16 × 2 = 0 + 0.636 781 605 424 453 721 784 32;
  • 67) 0.636 781 605 424 453 721 784 32 × 2 = 1 + 0.273 563 210 848 907 443 568 64;
  • 68) 0.273 563 210 848 907 443 568 64 × 2 = 0 + 0.547 126 421 697 814 887 137 28;
  • 69) 0.547 126 421 697 814 887 137 28 × 2 = 1 + 0.094 252 843 395 629 774 274 56;
  • 70) 0.094 252 843 395 629 774 274 56 × 2 = 0 + 0.188 505 686 791 259 548 549 12;
  • 71) 0.188 505 686 791 259 548 549 12 × 2 = 0 + 0.377 011 373 582 519 097 098 24;
  • 72) 0.377 011 373 582 519 097 098 24 × 2 = 0 + 0.754 022 747 165 038 194 196 48;
  • 73) 0.754 022 747 165 038 194 196 48 × 2 = 1 + 0.508 045 494 330 076 388 392 96;
  • 74) 0.508 045 494 330 076 388 392 96 × 2 = 1 + 0.016 090 988 660 152 776 785 92;
  • 75) 0.016 090 988 660 152 776 785 92 × 2 = 0 + 0.032 181 977 320 305 553 571 84;
  • 76) 0.032 181 977 320 305 553 571 84 × 2 = 0 + 0.064 363 954 640 611 107 143 68;
  • 77) 0.064 363 954 640 611 107 143 68 × 2 = 0 + 0.128 727 909 281 222 214 287 36;
  • 78) 0.128 727 909 281 222 214 287 36 × 2 = 0 + 0.257 455 818 562 444 428 574 72;
  • 79) 0.257 455 818 562 444 428 574 72 × 2 = 0 + 0.514 911 637 124 888 857 149 44;
  • 80) 0.514 911 637 124 888 857 149 44 × 2 = 1 + 0.029 823 274 249 777 714 298 88;
  • 81) 0.029 823 274 249 777 714 298 88 × 2 = 0 + 0.059 646 548 499 555 428 597 76;
  • 82) 0.059 646 548 499 555 428 597 76 × 2 = 0 + 0.119 293 096 999 110 857 195 52;
  • 83) 0.119 293 096 999 110 857 195 52 × 2 = 0 + 0.238 586 193 998 221 714 391 04;
  • 84) 0.238 586 193 998 221 714 391 04 × 2 = 0 + 0.477 172 387 996 443 428 782 08;
  • 85) 0.477 172 387 996 443 428 782 08 × 2 = 0 + 0.954 344 775 992 886 857 564 16;
  • 86) 0.954 344 775 992 886 857 564 16 × 2 = 1 + 0.908 689 551 985 773 715 128 32;
  • 87) 0.908 689 551 985 773 715 128 32 × 2 = 1 + 0.817 379 103 971 547 430 256 64;
  • 88) 0.817 379 103 971 547 430 256 64 × 2 = 1 + 0.634 758 207 943 094 860 513 28;
  • 89) 0.634 758 207 943 094 860 513 28 × 2 = 1 + 0.269 516 415 886 189 721 026 56;
  • 90) 0.269 516 415 886 189 721 026 56 × 2 = 0 + 0.539 032 831 772 379 442 053 12;
  • 91) 0.539 032 831 772 379 442 053 12 × 2 = 1 + 0.078 065 663 544 758 884 106 24;
  • 92) 0.078 065 663 544 758 884 106 24 × 2 = 0 + 0.156 131 327 089 517 768 212 48;
  • 93) 0.156 131 327 089 517 768 212 48 × 2 = 0 + 0.312 262 654 179 035 536 424 96;
  • 94) 0.312 262 654 179 035 536 424 96 × 2 = 0 + 0.624 525 308 358 071 072 849 92;
  • 95) 0.624 525 308 358 071 072 849 92 × 2 = 1 + 0.249 050 616 716 142 145 699 84;
  • 96) 0.249 050 616 716 142 145 699 84 × 2 = 0 + 0.498 101 233 432 284 291 399 68;
  • 97) 0.498 101 233 432 284 291 399 68 × 2 = 0 + 0.996 202 466 864 568 582 799 36;
  • 98) 0.996 202 466 864 568 582 799 36 × 2 = 1 + 0.992 404 933 729 137 165 598 72;
  • 99) 0.992 404 933 729 137 165 598 72 × 2 = 1 + 0.984 809 867 458 274 331 197 44;
  • 100) 0.984 809 867 458 274 331 197 44 × 2 = 1 + 0.969 619 734 916 548 662 394 88;
  • 101) 0.969 619 734 916 548 662 394 88 × 2 = 1 + 0.939 239 469 833 097 324 789 76;
  • 102) 0.939 239 469 833 097 324 789 76 × 2 = 1 + 0.878 478 939 666 194 649 579 52;
  • 103) 0.878 478 939 666 194 649 579 52 × 2 = 1 + 0.756 957 879 332 389 299 159 04;
  • 104) 0.756 957 879 332 389 299 159 04 × 2 = 1 + 0.513 915 758 664 778 598 318 08;
  • 105) 0.513 915 758 664 778 598 318 08 × 2 = 1 + 0.027 831 517 329 557 196 636 16;
  • 106) 0.027 831 517 329 557 196 636 16 × 2 = 0 + 0.055 663 034 659 114 393 272 32;
  • 107) 0.055 663 034 659 114 393 272 32 × 2 = 0 + 0.111 326 069 318 228 786 544 64;
  • 108) 0.111 326 069 318 228 786 544 64 × 2 = 0 + 0.222 652 138 636 457 573 089 28;
  • 109) 0.222 652 138 636 457 573 089 28 × 2 = 0 + 0.445 304 277 272 915 146 178 56;
  • 110) 0.445 304 277 272 915 146 178 56 × 2 = 0 + 0.890 608 554 545 830 292 357 12;
  • 111) 0.890 608 554 545 830 292 357 12 × 2 = 1 + 0.781 217 109 091 660 584 714 24;
  • 112) 0.781 217 109 091 660 584 714 24 × 2 = 1 + 0.562 434 218 183 321 169 428 48;
  • 113) 0.562 434 218 183 321 169 428 48 × 2 = 1 + 0.124 868 436 366 642 338 856 96;
  • 114) 0.124 868 436 366 642 338 856 96 × 2 = 0 + 0.249 736 872 733 284 677 713 92;
  • 115) 0.249 736 872 733 284 677 713 92 × 2 = 0 + 0.499 473 745 466 569 355 427 84;
  • 116) 0.499 473 745 466 569 355 427 84 × 2 = 0 + 0.998 947 490 933 138 710 855 68;
  • 117) 0.998 947 490 933 138 710 855 68 × 2 = 1 + 0.997 894 981 866 277 421 711 36;
  • 118) 0.997 894 981 866 277 421 711 36 × 2 = 1 + 0.995 789 963 732 554 843 422 72;
  • 119) 0.995 789 963 732 554 843 422 72 × 2 = 1 + 0.991 579 927 465 109 686 845 44;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 63(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 1100 0001 0000 0111 1010 0010 0111 1111 1000 0011 1000 111(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 63(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 1100 0001 0000 0111 1010 0010 0111 1111 1000 0011 1000 111(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 63(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 1100 0001 0000 0111 1010 0010 0111 1111 1000 0011 1000 111(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 1100 0001 0000 0111 1010 0010 0111 1111 1000 0011 1000 111(2) × 20 =


1.0100 0110 0000 1000 0011 1101 0001 0011 1111 1100 0001 1100 0111(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0110 0000 1000 0011 1101 0001 0011 1111 1100 0001 1100 0111


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0110 0000 1000 0011 1101 0001 0011 1111 1100 0001 1100 0111 =


0100 0110 0000 1000 0011 1101 0001 0011 1111 1100 0001 1100 0111


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0110 0000 1000 0011 1101 0001 0011 1111 1100 0001 1100 0111


Decimal number 0.000 000 000 000 000 000 008 63 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0110 0000 1000 0011 1101 0001 0011 1111 1100 0001 1100 0111


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100