0.000 000 000 000 000 000 008 534 37 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 000 008 534 37(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 000 008 534 37(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 000 008 534 37.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 000 008 534 37 × 2 = 0 + 0.000 000 000 000 000 000 017 068 74;
  • 2) 0.000 000 000 000 000 000 017 068 74 × 2 = 0 + 0.000 000 000 000 000 000 034 137 48;
  • 3) 0.000 000 000 000 000 000 034 137 48 × 2 = 0 + 0.000 000 000 000 000 000 068 274 96;
  • 4) 0.000 000 000 000 000 000 068 274 96 × 2 = 0 + 0.000 000 000 000 000 000 136 549 92;
  • 5) 0.000 000 000 000 000 000 136 549 92 × 2 = 0 + 0.000 000 000 000 000 000 273 099 84;
  • 6) 0.000 000 000 000 000 000 273 099 84 × 2 = 0 + 0.000 000 000 000 000 000 546 199 68;
  • 7) 0.000 000 000 000 000 000 546 199 68 × 2 = 0 + 0.000 000 000 000 000 001 092 399 36;
  • 8) 0.000 000 000 000 000 001 092 399 36 × 2 = 0 + 0.000 000 000 000 000 002 184 798 72;
  • 9) 0.000 000 000 000 000 002 184 798 72 × 2 = 0 + 0.000 000 000 000 000 004 369 597 44;
  • 10) 0.000 000 000 000 000 004 369 597 44 × 2 = 0 + 0.000 000 000 000 000 008 739 194 88;
  • 11) 0.000 000 000 000 000 008 739 194 88 × 2 = 0 + 0.000 000 000 000 000 017 478 389 76;
  • 12) 0.000 000 000 000 000 017 478 389 76 × 2 = 0 + 0.000 000 000 000 000 034 956 779 52;
  • 13) 0.000 000 000 000 000 034 956 779 52 × 2 = 0 + 0.000 000 000 000 000 069 913 559 04;
  • 14) 0.000 000 000 000 000 069 913 559 04 × 2 = 0 + 0.000 000 000 000 000 139 827 118 08;
  • 15) 0.000 000 000 000 000 139 827 118 08 × 2 = 0 + 0.000 000 000 000 000 279 654 236 16;
  • 16) 0.000 000 000 000 000 279 654 236 16 × 2 = 0 + 0.000 000 000 000 000 559 308 472 32;
  • 17) 0.000 000 000 000 000 559 308 472 32 × 2 = 0 + 0.000 000 000 000 001 118 616 944 64;
  • 18) 0.000 000 000 000 001 118 616 944 64 × 2 = 0 + 0.000 000 000 000 002 237 233 889 28;
  • 19) 0.000 000 000 000 002 237 233 889 28 × 2 = 0 + 0.000 000 000 000 004 474 467 778 56;
  • 20) 0.000 000 000 000 004 474 467 778 56 × 2 = 0 + 0.000 000 000 000 008 948 935 557 12;
  • 21) 0.000 000 000 000 008 948 935 557 12 × 2 = 0 + 0.000 000 000 000 017 897 871 114 24;
  • 22) 0.000 000 000 000 017 897 871 114 24 × 2 = 0 + 0.000 000 000 000 035 795 742 228 48;
  • 23) 0.000 000 000 000 035 795 742 228 48 × 2 = 0 + 0.000 000 000 000 071 591 484 456 96;
  • 24) 0.000 000 000 000 071 591 484 456 96 × 2 = 0 + 0.000 000 000 000 143 182 968 913 92;
  • 25) 0.000 000 000 000 143 182 968 913 92 × 2 = 0 + 0.000 000 000 000 286 365 937 827 84;
  • 26) 0.000 000 000 000 286 365 937 827 84 × 2 = 0 + 0.000 000 000 000 572 731 875 655 68;
  • 27) 0.000 000 000 000 572 731 875 655 68 × 2 = 0 + 0.000 000 000 001 145 463 751 311 36;
  • 28) 0.000 000 000 001 145 463 751 311 36 × 2 = 0 + 0.000 000 000 002 290 927 502 622 72;
  • 29) 0.000 000 000 002 290 927 502 622 72 × 2 = 0 + 0.000 000 000 004 581 855 005 245 44;
  • 30) 0.000 000 000 004 581 855 005 245 44 × 2 = 0 + 0.000 000 000 009 163 710 010 490 88;
  • 31) 0.000 000 000 009 163 710 010 490 88 × 2 = 0 + 0.000 000 000 018 327 420 020 981 76;
  • 32) 0.000 000 000 018 327 420 020 981 76 × 2 = 0 + 0.000 000 000 036 654 840 041 963 52;
  • 33) 0.000 000 000 036 654 840 041 963 52 × 2 = 0 + 0.000 000 000 073 309 680 083 927 04;
  • 34) 0.000 000 000 073 309 680 083 927 04 × 2 = 0 + 0.000 000 000 146 619 360 167 854 08;
  • 35) 0.000 000 000 146 619 360 167 854 08 × 2 = 0 + 0.000 000 000 293 238 720 335 708 16;
  • 36) 0.000 000 000 293 238 720 335 708 16 × 2 = 0 + 0.000 000 000 586 477 440 671 416 32;
  • 37) 0.000 000 000 586 477 440 671 416 32 × 2 = 0 + 0.000 000 001 172 954 881 342 832 64;
  • 38) 0.000 000 001 172 954 881 342 832 64 × 2 = 0 + 0.000 000 002 345 909 762 685 665 28;
  • 39) 0.000 000 002 345 909 762 685 665 28 × 2 = 0 + 0.000 000 004 691 819 525 371 330 56;
  • 40) 0.000 000 004 691 819 525 371 330 56 × 2 = 0 + 0.000 000 009 383 639 050 742 661 12;
  • 41) 0.000 000 009 383 639 050 742 661 12 × 2 = 0 + 0.000 000 018 767 278 101 485 322 24;
  • 42) 0.000 000 018 767 278 101 485 322 24 × 2 = 0 + 0.000 000 037 534 556 202 970 644 48;
  • 43) 0.000 000 037 534 556 202 970 644 48 × 2 = 0 + 0.000 000 075 069 112 405 941 288 96;
  • 44) 0.000 000 075 069 112 405 941 288 96 × 2 = 0 + 0.000 000 150 138 224 811 882 577 92;
  • 45) 0.000 000 150 138 224 811 882 577 92 × 2 = 0 + 0.000 000 300 276 449 623 765 155 84;
  • 46) 0.000 000 300 276 449 623 765 155 84 × 2 = 0 + 0.000 000 600 552 899 247 530 311 68;
  • 47) 0.000 000 600 552 899 247 530 311 68 × 2 = 0 + 0.000 001 201 105 798 495 060 623 36;
  • 48) 0.000 001 201 105 798 495 060 623 36 × 2 = 0 + 0.000 002 402 211 596 990 121 246 72;
  • 49) 0.000 002 402 211 596 990 121 246 72 × 2 = 0 + 0.000 004 804 423 193 980 242 493 44;
  • 50) 0.000 004 804 423 193 980 242 493 44 × 2 = 0 + 0.000 009 608 846 387 960 484 986 88;
  • 51) 0.000 009 608 846 387 960 484 986 88 × 2 = 0 + 0.000 019 217 692 775 920 969 973 76;
  • 52) 0.000 019 217 692 775 920 969 973 76 × 2 = 0 + 0.000 038 435 385 551 841 939 947 52;
  • 53) 0.000 038 435 385 551 841 939 947 52 × 2 = 0 + 0.000 076 870 771 103 683 879 895 04;
  • 54) 0.000 076 870 771 103 683 879 895 04 × 2 = 0 + 0.000 153 741 542 207 367 759 790 08;
  • 55) 0.000 153 741 542 207 367 759 790 08 × 2 = 0 + 0.000 307 483 084 414 735 519 580 16;
  • 56) 0.000 307 483 084 414 735 519 580 16 × 2 = 0 + 0.000 614 966 168 829 471 039 160 32;
  • 57) 0.000 614 966 168 829 471 039 160 32 × 2 = 0 + 0.001 229 932 337 658 942 078 320 64;
  • 58) 0.001 229 932 337 658 942 078 320 64 × 2 = 0 + 0.002 459 864 675 317 884 156 641 28;
  • 59) 0.002 459 864 675 317 884 156 641 28 × 2 = 0 + 0.004 919 729 350 635 768 313 282 56;
  • 60) 0.004 919 729 350 635 768 313 282 56 × 2 = 0 + 0.009 839 458 701 271 536 626 565 12;
  • 61) 0.009 839 458 701 271 536 626 565 12 × 2 = 0 + 0.019 678 917 402 543 073 253 130 24;
  • 62) 0.019 678 917 402 543 073 253 130 24 × 2 = 0 + 0.039 357 834 805 086 146 506 260 48;
  • 63) 0.039 357 834 805 086 146 506 260 48 × 2 = 0 + 0.078 715 669 610 172 293 012 520 96;
  • 64) 0.078 715 669 610 172 293 012 520 96 × 2 = 0 + 0.157 431 339 220 344 586 025 041 92;
  • 65) 0.157 431 339 220 344 586 025 041 92 × 2 = 0 + 0.314 862 678 440 689 172 050 083 84;
  • 66) 0.314 862 678 440 689 172 050 083 84 × 2 = 0 + 0.629 725 356 881 378 344 100 167 68;
  • 67) 0.629 725 356 881 378 344 100 167 68 × 2 = 1 + 0.259 450 713 762 756 688 200 335 36;
  • 68) 0.259 450 713 762 756 688 200 335 36 × 2 = 0 + 0.518 901 427 525 513 376 400 670 72;
  • 69) 0.518 901 427 525 513 376 400 670 72 × 2 = 1 + 0.037 802 855 051 026 752 801 341 44;
  • 70) 0.037 802 855 051 026 752 801 341 44 × 2 = 0 + 0.075 605 710 102 053 505 602 682 88;
  • 71) 0.075 605 710 102 053 505 602 682 88 × 2 = 0 + 0.151 211 420 204 107 011 205 365 76;
  • 72) 0.151 211 420 204 107 011 205 365 76 × 2 = 0 + 0.302 422 840 408 214 022 410 731 52;
  • 73) 0.302 422 840 408 214 022 410 731 52 × 2 = 0 + 0.604 845 680 816 428 044 821 463 04;
  • 74) 0.604 845 680 816 428 044 821 463 04 × 2 = 1 + 0.209 691 361 632 856 089 642 926 08;
  • 75) 0.209 691 361 632 856 089 642 926 08 × 2 = 0 + 0.419 382 723 265 712 179 285 852 16;
  • 76) 0.419 382 723 265 712 179 285 852 16 × 2 = 0 + 0.838 765 446 531 424 358 571 704 32;
  • 77) 0.838 765 446 531 424 358 571 704 32 × 2 = 1 + 0.677 530 893 062 848 717 143 408 64;
  • 78) 0.677 530 893 062 848 717 143 408 64 × 2 = 1 + 0.355 061 786 125 697 434 286 817 28;
  • 79) 0.355 061 786 125 697 434 286 817 28 × 2 = 0 + 0.710 123 572 251 394 868 573 634 56;
  • 80) 0.710 123 572 251 394 868 573 634 56 × 2 = 1 + 0.420 247 144 502 789 737 147 269 12;
  • 81) 0.420 247 144 502 789 737 147 269 12 × 2 = 0 + 0.840 494 289 005 579 474 294 538 24;
  • 82) 0.840 494 289 005 579 474 294 538 24 × 2 = 1 + 0.680 988 578 011 158 948 589 076 48;
  • 83) 0.680 988 578 011 158 948 589 076 48 × 2 = 1 + 0.361 977 156 022 317 897 178 152 96;
  • 84) 0.361 977 156 022 317 897 178 152 96 × 2 = 0 + 0.723 954 312 044 635 794 356 305 92;
  • 85) 0.723 954 312 044 635 794 356 305 92 × 2 = 1 + 0.447 908 624 089 271 588 712 611 84;
  • 86) 0.447 908 624 089 271 588 712 611 84 × 2 = 0 + 0.895 817 248 178 543 177 425 223 68;
  • 87) 0.895 817 248 178 543 177 425 223 68 × 2 = 1 + 0.791 634 496 357 086 354 850 447 36;
  • 88) 0.791 634 496 357 086 354 850 447 36 × 2 = 1 + 0.583 268 992 714 172 709 700 894 72;
  • 89) 0.583 268 992 714 172 709 700 894 72 × 2 = 1 + 0.166 537 985 428 345 419 401 789 44;
  • 90) 0.166 537 985 428 345 419 401 789 44 × 2 = 0 + 0.333 075 970 856 690 838 803 578 88;
  • 91) 0.333 075 970 856 690 838 803 578 88 × 2 = 0 + 0.666 151 941 713 381 677 607 157 76;
  • 92) 0.666 151 941 713 381 677 607 157 76 × 2 = 1 + 0.332 303 883 426 763 355 214 315 52;
  • 93) 0.332 303 883 426 763 355 214 315 52 × 2 = 0 + 0.664 607 766 853 526 710 428 631 04;
  • 94) 0.664 607 766 853 526 710 428 631 04 × 2 = 1 + 0.329 215 533 707 053 420 857 262 08;
  • 95) 0.329 215 533 707 053 420 857 262 08 × 2 = 0 + 0.658 431 067 414 106 841 714 524 16;
  • 96) 0.658 431 067 414 106 841 714 524 16 × 2 = 1 + 0.316 862 134 828 213 683 429 048 32;
  • 97) 0.316 862 134 828 213 683 429 048 32 × 2 = 0 + 0.633 724 269 656 427 366 858 096 64;
  • 98) 0.633 724 269 656 427 366 858 096 64 × 2 = 1 + 0.267 448 539 312 854 733 716 193 28;
  • 99) 0.267 448 539 312 854 733 716 193 28 × 2 = 0 + 0.534 897 078 625 709 467 432 386 56;
  • 100) 0.534 897 078 625 709 467 432 386 56 × 2 = 1 + 0.069 794 157 251 418 934 864 773 12;
  • 101) 0.069 794 157 251 418 934 864 773 12 × 2 = 0 + 0.139 588 314 502 837 869 729 546 24;
  • 102) 0.139 588 314 502 837 869 729 546 24 × 2 = 0 + 0.279 176 629 005 675 739 459 092 48;
  • 103) 0.279 176 629 005 675 739 459 092 48 × 2 = 0 + 0.558 353 258 011 351 478 918 184 96;
  • 104) 0.558 353 258 011 351 478 918 184 96 × 2 = 1 + 0.116 706 516 022 702 957 836 369 92;
  • 105) 0.116 706 516 022 702 957 836 369 92 × 2 = 0 + 0.233 413 032 045 405 915 672 739 84;
  • 106) 0.233 413 032 045 405 915 672 739 84 × 2 = 0 + 0.466 826 064 090 811 831 345 479 68;
  • 107) 0.466 826 064 090 811 831 345 479 68 × 2 = 0 + 0.933 652 128 181 623 662 690 959 36;
  • 108) 0.933 652 128 181 623 662 690 959 36 × 2 = 1 + 0.867 304 256 363 247 325 381 918 72;
  • 109) 0.867 304 256 363 247 325 381 918 72 × 2 = 1 + 0.734 608 512 726 494 650 763 837 44;
  • 110) 0.734 608 512 726 494 650 763 837 44 × 2 = 1 + 0.469 217 025 452 989 301 527 674 88;
  • 111) 0.469 217 025 452 989 301 527 674 88 × 2 = 0 + 0.938 434 050 905 978 603 055 349 76;
  • 112) 0.938 434 050 905 978 603 055 349 76 × 2 = 1 + 0.876 868 101 811 957 206 110 699 52;
  • 113) 0.876 868 101 811 957 206 110 699 52 × 2 = 1 + 0.753 736 203 623 914 412 221 399 04;
  • 114) 0.753 736 203 623 914 412 221 399 04 × 2 = 1 + 0.507 472 407 247 828 824 442 798 08;
  • 115) 0.507 472 407 247 828 824 442 798 08 × 2 = 1 + 0.014 944 814 495 657 648 885 596 16;
  • 116) 0.014 944 814 495 657 648 885 596 16 × 2 = 0 + 0.029 889 628 991 315 297 771 192 32;
  • 117) 0.029 889 628 991 315 297 771 192 32 × 2 = 0 + 0.059 779 257 982 630 595 542 384 64;
  • 118) 0.059 779 257 982 630 595 542 384 64 × 2 = 0 + 0.119 558 515 965 261 191 084 769 28;
  • 119) 0.119 558 515 965 261 191 084 769 28 × 2 = 0 + 0.239 117 031 930 522 382 169 538 56;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 000 008 534 37(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1101 0110 1011 1001 0101 0101 0001 0001 1101 1110 000(2)

5. Positive number before normalization:

0.000 000 000 000 000 000 008 534 37(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1101 0110 1011 1001 0101 0101 0001 0001 1101 1110 000(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 67 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 000 008 534 37(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1101 0110 1011 1001 0101 0101 0001 0001 1101 1110 000(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 1000 0100 1101 0110 1011 1001 0101 0101 0001 0001 1101 1110 000(2) × 20 =


1.0100 0010 0110 1011 0101 1100 1010 1010 1000 1000 1110 1111 0000(2) × 2-67


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -67


Mantissa (not normalized):
1.0100 0010 0110 1011 0101 1100 1010 1010 1000 1000 1110 1111 0000


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-67 + 2(11-1) - 1 =


(-67 + 1 023)(10) =


956(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 956 ÷ 2 = 478 + 0;
  • 478 ÷ 2 = 239 + 0;
  • 239 ÷ 2 = 119 + 1;
  • 119 ÷ 2 = 59 + 1;
  • 59 ÷ 2 = 29 + 1;
  • 29 ÷ 2 = 14 + 1;
  • 14 ÷ 2 = 7 + 0;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


956(10) =


011 1011 1100(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0100 0010 0110 1011 0101 1100 1010 1010 1000 1000 1110 1111 0000 =


0100 0010 0110 1011 0101 1100 1010 1010 1000 1000 1110 1111 0000


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1011 1100


Mantissa (52 bits) =
0100 0010 0110 1011 0101 1100 1010 1010 1000 1000 1110 1111 0000


Decimal number 0.000 000 000 000 000 000 008 534 37 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1011 1100 - 0100 0010 0110 1011 0101 1100 1010 1010 1000 1000 1110 1111 0000


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100