Decimal to 64 Bit IEEE 754 Binary: Convert Number 0.000 000 000 000 000 222 044 604 925 031 308 084 726 36 to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard, From Base Ten Decimal System

Number 0.000 000 000 000 000 222 044 604 925 031 308 084 726 36(10) converted and written in 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 222 044 604 925 031 308 084 726 36.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 222 044 604 925 031 308 084 726 36 × 2 = 0 + 0.000 000 000 000 000 444 089 209 850 062 616 169 452 72;
  • 2) 0.000 000 000 000 000 444 089 209 850 062 616 169 452 72 × 2 = 0 + 0.000 000 000 000 000 888 178 419 700 125 232 338 905 44;
  • 3) 0.000 000 000 000 000 888 178 419 700 125 232 338 905 44 × 2 = 0 + 0.000 000 000 000 001 776 356 839 400 250 464 677 810 88;
  • 4) 0.000 000 000 000 001 776 356 839 400 250 464 677 810 88 × 2 = 0 + 0.000 000 000 000 003 552 713 678 800 500 929 355 621 76;
  • 5) 0.000 000 000 000 003 552 713 678 800 500 929 355 621 76 × 2 = 0 + 0.000 000 000 000 007 105 427 357 601 001 858 711 243 52;
  • 6) 0.000 000 000 000 007 105 427 357 601 001 858 711 243 52 × 2 = 0 + 0.000 000 000 000 014 210 854 715 202 003 717 422 487 04;
  • 7) 0.000 000 000 000 014 210 854 715 202 003 717 422 487 04 × 2 = 0 + 0.000 000 000 000 028 421 709 430 404 007 434 844 974 08;
  • 8) 0.000 000 000 000 028 421 709 430 404 007 434 844 974 08 × 2 = 0 + 0.000 000 000 000 056 843 418 860 808 014 869 689 948 16;
  • 9) 0.000 000 000 000 056 843 418 860 808 014 869 689 948 16 × 2 = 0 + 0.000 000 000 000 113 686 837 721 616 029 739 379 896 32;
  • 10) 0.000 000 000 000 113 686 837 721 616 029 739 379 896 32 × 2 = 0 + 0.000 000 000 000 227 373 675 443 232 059 478 759 792 64;
  • 11) 0.000 000 000 000 227 373 675 443 232 059 478 759 792 64 × 2 = 0 + 0.000 000 000 000 454 747 350 886 464 118 957 519 585 28;
  • 12) 0.000 000 000 000 454 747 350 886 464 118 957 519 585 28 × 2 = 0 + 0.000 000 000 000 909 494 701 772 928 237 915 039 170 56;
  • 13) 0.000 000 000 000 909 494 701 772 928 237 915 039 170 56 × 2 = 0 + 0.000 000 000 001 818 989 403 545 856 475 830 078 341 12;
  • 14) 0.000 000 000 001 818 989 403 545 856 475 830 078 341 12 × 2 = 0 + 0.000 000 000 003 637 978 807 091 712 951 660 156 682 24;
  • 15) 0.000 000 000 003 637 978 807 091 712 951 660 156 682 24 × 2 = 0 + 0.000 000 000 007 275 957 614 183 425 903 320 313 364 48;
  • 16) 0.000 000 000 007 275 957 614 183 425 903 320 313 364 48 × 2 = 0 + 0.000 000 000 014 551 915 228 366 851 806 640 626 728 96;
  • 17) 0.000 000 000 014 551 915 228 366 851 806 640 626 728 96 × 2 = 0 + 0.000 000 000 029 103 830 456 733 703 613 281 253 457 92;
  • 18) 0.000 000 000 029 103 830 456 733 703 613 281 253 457 92 × 2 = 0 + 0.000 000 000 058 207 660 913 467 407 226 562 506 915 84;
  • 19) 0.000 000 000 058 207 660 913 467 407 226 562 506 915 84 × 2 = 0 + 0.000 000 000 116 415 321 826 934 814 453 125 013 831 68;
  • 20) 0.000 000 000 116 415 321 826 934 814 453 125 013 831 68 × 2 = 0 + 0.000 000 000 232 830 643 653 869 628 906 250 027 663 36;
  • 21) 0.000 000 000 232 830 643 653 869 628 906 250 027 663 36 × 2 = 0 + 0.000 000 000 465 661 287 307 739 257 812 500 055 326 72;
  • 22) 0.000 000 000 465 661 287 307 739 257 812 500 055 326 72 × 2 = 0 + 0.000 000 000 931 322 574 615 478 515 625 000 110 653 44;
  • 23) 0.000 000 000 931 322 574 615 478 515 625 000 110 653 44 × 2 = 0 + 0.000 000 001 862 645 149 230 957 031 250 000 221 306 88;
  • 24) 0.000 000 001 862 645 149 230 957 031 250 000 221 306 88 × 2 = 0 + 0.000 000 003 725 290 298 461 914 062 500 000 442 613 76;
  • 25) 0.000 000 003 725 290 298 461 914 062 500 000 442 613 76 × 2 = 0 + 0.000 000 007 450 580 596 923 828 125 000 000 885 227 52;
  • 26) 0.000 000 007 450 580 596 923 828 125 000 000 885 227 52 × 2 = 0 + 0.000 000 014 901 161 193 847 656 250 000 001 770 455 04;
  • 27) 0.000 000 014 901 161 193 847 656 250 000 001 770 455 04 × 2 = 0 + 0.000 000 029 802 322 387 695 312 500 000 003 540 910 08;
  • 28) 0.000 000 029 802 322 387 695 312 500 000 003 540 910 08 × 2 = 0 + 0.000 000 059 604 644 775 390 625 000 000 007 081 820 16;
  • 29) 0.000 000 059 604 644 775 390 625 000 000 007 081 820 16 × 2 = 0 + 0.000 000 119 209 289 550 781 250 000 000 014 163 640 32;
  • 30) 0.000 000 119 209 289 550 781 250 000 000 014 163 640 32 × 2 = 0 + 0.000 000 238 418 579 101 562 500 000 000 028 327 280 64;
  • 31) 0.000 000 238 418 579 101 562 500 000 000 028 327 280 64 × 2 = 0 + 0.000 000 476 837 158 203 125 000 000 000 056 654 561 28;
  • 32) 0.000 000 476 837 158 203 125 000 000 000 056 654 561 28 × 2 = 0 + 0.000 000 953 674 316 406 250 000 000 000 113 309 122 56;
  • 33) 0.000 000 953 674 316 406 250 000 000 000 113 309 122 56 × 2 = 0 + 0.000 001 907 348 632 812 500 000 000 000 226 618 245 12;
  • 34) 0.000 001 907 348 632 812 500 000 000 000 226 618 245 12 × 2 = 0 + 0.000 003 814 697 265 625 000 000 000 000 453 236 490 24;
  • 35) 0.000 003 814 697 265 625 000 000 000 000 453 236 490 24 × 2 = 0 + 0.000 007 629 394 531 250 000 000 000 000 906 472 980 48;
  • 36) 0.000 007 629 394 531 250 000 000 000 000 906 472 980 48 × 2 = 0 + 0.000 015 258 789 062 500 000 000 000 001 812 945 960 96;
  • 37) 0.000 015 258 789 062 500 000 000 000 001 812 945 960 96 × 2 = 0 + 0.000 030 517 578 125 000 000 000 000 003 625 891 921 92;
  • 38) 0.000 030 517 578 125 000 000 000 000 003 625 891 921 92 × 2 = 0 + 0.000 061 035 156 250 000 000 000 000 007 251 783 843 84;
  • 39) 0.000 061 035 156 250 000 000 000 000 007 251 783 843 84 × 2 = 0 + 0.000 122 070 312 500 000 000 000 000 014 503 567 687 68;
  • 40) 0.000 122 070 312 500 000 000 000 000 014 503 567 687 68 × 2 = 0 + 0.000 244 140 625 000 000 000 000 000 029 007 135 375 36;
  • 41) 0.000 244 140 625 000 000 000 000 000 029 007 135 375 36 × 2 = 0 + 0.000 488 281 250 000 000 000 000 000 058 014 270 750 72;
  • 42) 0.000 488 281 250 000 000 000 000 000 058 014 270 750 72 × 2 = 0 + 0.000 976 562 500 000 000 000 000 000 116 028 541 501 44;
  • 43) 0.000 976 562 500 000 000 000 000 000 116 028 541 501 44 × 2 = 0 + 0.001 953 125 000 000 000 000 000 000 232 057 083 002 88;
  • 44) 0.001 953 125 000 000 000 000 000 000 232 057 083 002 88 × 2 = 0 + 0.003 906 250 000 000 000 000 000 000 464 114 166 005 76;
  • 45) 0.003 906 250 000 000 000 000 000 000 464 114 166 005 76 × 2 = 0 + 0.007 812 500 000 000 000 000 000 000 928 228 332 011 52;
  • 46) 0.007 812 500 000 000 000 000 000 000 928 228 332 011 52 × 2 = 0 + 0.015 625 000 000 000 000 000 000 001 856 456 664 023 04;
  • 47) 0.015 625 000 000 000 000 000 000 001 856 456 664 023 04 × 2 = 0 + 0.031 250 000 000 000 000 000 000 003 712 913 328 046 08;
  • 48) 0.031 250 000 000 000 000 000 000 003 712 913 328 046 08 × 2 = 0 + 0.062 500 000 000 000 000 000 000 007 425 826 656 092 16;
  • 49) 0.062 500 000 000 000 000 000 000 007 425 826 656 092 16 × 2 = 0 + 0.125 000 000 000 000 000 000 000 014 851 653 312 184 32;
  • 50) 0.125 000 000 000 000 000 000 000 014 851 653 312 184 32 × 2 = 0 + 0.250 000 000 000 000 000 000 000 029 703 306 624 368 64;
  • 51) 0.250 000 000 000 000 000 000 000 029 703 306 624 368 64 × 2 = 0 + 0.500 000 000 000 000 000 000 000 059 406 613 248 737 28;
  • 52) 0.500 000 000 000 000 000 000 000 059 406 613 248 737 28 × 2 = 1 + 0.000 000 000 000 000 000 000 000 118 813 226 497 474 56;
  • 53) 0.000 000 000 000 000 000 000 000 118 813 226 497 474 56 × 2 = 0 + 0.000 000 000 000 000 000 000 000 237 626 452 994 949 12;
  • 54) 0.000 000 000 000 000 000 000 000 237 626 452 994 949 12 × 2 = 0 + 0.000 000 000 000 000 000 000 000 475 252 905 989 898 24;
  • 55) 0.000 000 000 000 000 000 000 000 475 252 905 989 898 24 × 2 = 0 + 0.000 000 000 000 000 000 000 000 950 505 811 979 796 48;
  • 56) 0.000 000 000 000 000 000 000 000 950 505 811 979 796 48 × 2 = 0 + 0.000 000 000 000 000 000 000 001 901 011 623 959 592 96;
  • 57) 0.000 000 000 000 000 000 000 001 901 011 623 959 592 96 × 2 = 0 + 0.000 000 000 000 000 000 000 003 802 023 247 919 185 92;
  • 58) 0.000 000 000 000 000 000 000 003 802 023 247 919 185 92 × 2 = 0 + 0.000 000 000 000 000 000 000 007 604 046 495 838 371 84;
  • 59) 0.000 000 000 000 000 000 000 007 604 046 495 838 371 84 × 2 = 0 + 0.000 000 000 000 000 000 000 015 208 092 991 676 743 68;
  • 60) 0.000 000 000 000 000 000 000 015 208 092 991 676 743 68 × 2 = 0 + 0.000 000 000 000 000 000 000 030 416 185 983 353 487 36;
  • 61) 0.000 000 000 000 000 000 000 030 416 185 983 353 487 36 × 2 = 0 + 0.000 000 000 000 000 000 000 060 832 371 966 706 974 72;
  • 62) 0.000 000 000 000 000 000 000 060 832 371 966 706 974 72 × 2 = 0 + 0.000 000 000 000 000 000 000 121 664 743 933 413 949 44;
  • 63) 0.000 000 000 000 000 000 000 121 664 743 933 413 949 44 × 2 = 0 + 0.000 000 000 000 000 000 000 243 329 487 866 827 898 88;
  • 64) 0.000 000 000 000 000 000 000 243 329 487 866 827 898 88 × 2 = 0 + 0.000 000 000 000 000 000 000 486 658 975 733 655 797 76;
  • 65) 0.000 000 000 000 000 000 000 486 658 975 733 655 797 76 × 2 = 0 + 0.000 000 000 000 000 000 000 973 317 951 467 311 595 52;
  • 66) 0.000 000 000 000 000 000 000 973 317 951 467 311 595 52 × 2 = 0 + 0.000 000 000 000 000 000 001 946 635 902 934 623 191 04;
  • 67) 0.000 000 000 000 000 000 001 946 635 902 934 623 191 04 × 2 = 0 + 0.000 000 000 000 000 000 003 893 271 805 869 246 382 08;
  • 68) 0.000 000 000 000 000 000 003 893 271 805 869 246 382 08 × 2 = 0 + 0.000 000 000 000 000 000 007 786 543 611 738 492 764 16;
  • 69) 0.000 000 000 000 000 000 007 786 543 611 738 492 764 16 × 2 = 0 + 0.000 000 000 000 000 000 015 573 087 223 476 985 528 32;
  • 70) 0.000 000 000 000 000 000 015 573 087 223 476 985 528 32 × 2 = 0 + 0.000 000 000 000 000 000 031 146 174 446 953 971 056 64;
  • 71) 0.000 000 000 000 000 000 031 146 174 446 953 971 056 64 × 2 = 0 + 0.000 000 000 000 000 000 062 292 348 893 907 942 113 28;
  • 72) 0.000 000 000 000 000 000 062 292 348 893 907 942 113 28 × 2 = 0 + 0.000 000 000 000 000 000 124 584 697 787 815 884 226 56;
  • 73) 0.000 000 000 000 000 000 124 584 697 787 815 884 226 56 × 2 = 0 + 0.000 000 000 000 000 000 249 169 395 575 631 768 453 12;
  • 74) 0.000 000 000 000 000 000 249 169 395 575 631 768 453 12 × 2 = 0 + 0.000 000 000 000 000 000 498 338 791 151 263 536 906 24;
  • 75) 0.000 000 000 000 000 000 498 338 791 151 263 536 906 24 × 2 = 0 + 0.000 000 000 000 000 000 996 677 582 302 527 073 812 48;
  • 76) 0.000 000 000 000 000 000 996 677 582 302 527 073 812 48 × 2 = 0 + 0.000 000 000 000 000 001 993 355 164 605 054 147 624 96;
  • 77) 0.000 000 000 000 000 001 993 355 164 605 054 147 624 96 × 2 = 0 + 0.000 000 000 000 000 003 986 710 329 210 108 295 249 92;
  • 78) 0.000 000 000 000 000 003 986 710 329 210 108 295 249 92 × 2 = 0 + 0.000 000 000 000 000 007 973 420 658 420 216 590 499 84;
  • 79) 0.000 000 000 000 000 007 973 420 658 420 216 590 499 84 × 2 = 0 + 0.000 000 000 000 000 015 946 841 316 840 433 180 999 68;
  • 80) 0.000 000 000 000 000 015 946 841 316 840 433 180 999 68 × 2 = 0 + 0.000 000 000 000 000 031 893 682 633 680 866 361 999 36;
  • 81) 0.000 000 000 000 000 031 893 682 633 680 866 361 999 36 × 2 = 0 + 0.000 000 000 000 000 063 787 365 267 361 732 723 998 72;
  • 82) 0.000 000 000 000 000 063 787 365 267 361 732 723 998 72 × 2 = 0 + 0.000 000 000 000 000 127 574 730 534 723 465 447 997 44;
  • 83) 0.000 000 000 000 000 127 574 730 534 723 465 447 997 44 × 2 = 0 + 0.000 000 000 000 000 255 149 461 069 446 930 895 994 88;
  • 84) 0.000 000 000 000 000 255 149 461 069 446 930 895 994 88 × 2 = 0 + 0.000 000 000 000 000 510 298 922 138 893 861 791 989 76;
  • 85) 0.000 000 000 000 000 510 298 922 138 893 861 791 989 76 × 2 = 0 + 0.000 000 000 000 001 020 597 844 277 787 723 583 979 52;
  • 86) 0.000 000 000 000 001 020 597 844 277 787 723 583 979 52 × 2 = 0 + 0.000 000 000 000 002 041 195 688 555 575 447 167 959 04;
  • 87) 0.000 000 000 000 002 041 195 688 555 575 447 167 959 04 × 2 = 0 + 0.000 000 000 000 004 082 391 377 111 150 894 335 918 08;
  • 88) 0.000 000 000 000 004 082 391 377 111 150 894 335 918 08 × 2 = 0 + 0.000 000 000 000 008 164 782 754 222 301 788 671 836 16;
  • 89) 0.000 000 000 000 008 164 782 754 222 301 788 671 836 16 × 2 = 0 + 0.000 000 000 000 016 329 565 508 444 603 577 343 672 32;
  • 90) 0.000 000 000 000 016 329 565 508 444 603 577 343 672 32 × 2 = 0 + 0.000 000 000 000 032 659 131 016 889 207 154 687 344 64;
  • 91) 0.000 000 000 000 032 659 131 016 889 207 154 687 344 64 × 2 = 0 + 0.000 000 000 000 065 318 262 033 778 414 309 374 689 28;
  • 92) 0.000 000 000 000 065 318 262 033 778 414 309 374 689 28 × 2 = 0 + 0.000 000 000 000 130 636 524 067 556 828 618 749 378 56;
  • 93) 0.000 000 000 000 130 636 524 067 556 828 618 749 378 56 × 2 = 0 + 0.000 000 000 000 261 273 048 135 113 657 237 498 757 12;
  • 94) 0.000 000 000 000 261 273 048 135 113 657 237 498 757 12 × 2 = 0 + 0.000 000 000 000 522 546 096 270 227 314 474 997 514 24;
  • 95) 0.000 000 000 000 522 546 096 270 227 314 474 997 514 24 × 2 = 0 + 0.000 000 000 001 045 092 192 540 454 628 949 995 028 48;
  • 96) 0.000 000 000 001 045 092 192 540 454 628 949 995 028 48 × 2 = 0 + 0.000 000 000 002 090 184 385 080 909 257 899 990 056 96;
  • 97) 0.000 000 000 002 090 184 385 080 909 257 899 990 056 96 × 2 = 0 + 0.000 000 000 004 180 368 770 161 818 515 799 980 113 92;
  • 98) 0.000 000 000 004 180 368 770 161 818 515 799 980 113 92 × 2 = 0 + 0.000 000 000 008 360 737 540 323 637 031 599 960 227 84;
  • 99) 0.000 000 000 008 360 737 540 323 637 031 599 960 227 84 × 2 = 0 + 0.000 000 000 016 721 475 080 647 274 063 199 920 455 68;
  • 100) 0.000 000 000 016 721 475 080 647 274 063 199 920 455 68 × 2 = 0 + 0.000 000 000 033 442 950 161 294 548 126 399 840 911 36;
  • 101) 0.000 000 000 033 442 950 161 294 548 126 399 840 911 36 × 2 = 0 + 0.000 000 000 066 885 900 322 589 096 252 799 681 822 72;
  • 102) 0.000 000 000 066 885 900 322 589 096 252 799 681 822 72 × 2 = 0 + 0.000 000 000 133 771 800 645 178 192 505 599 363 645 44;
  • 103) 0.000 000 000 133 771 800 645 178 192 505 599 363 645 44 × 2 = 0 + 0.000 000 000 267 543 601 290 356 385 011 198 727 290 88;
  • 104) 0.000 000 000 267 543 601 290 356 385 011 198 727 290 88 × 2 = 0 + 0.000 000 000 535 087 202 580 712 770 022 397 454 581 76;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 222 044 604 925 031 308 084 726 36(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000(2)

5. Positive number before normalization:

0.000 000 000 000 000 222 044 604 925 031 308 084 726 36(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 52 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 222 044 604 925 031 308 084 726 36(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000(2) × 20 =


1.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000(2) × 2-52


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -52


Mantissa (not normalized):
1.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-52 + 2(11-1) - 1 =


(-52 + 1 023)(10) =


971(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 971 ÷ 2 = 485 + 1;
  • 485 ÷ 2 = 242 + 1;
  • 242 ÷ 2 = 121 + 0;
  • 121 ÷ 2 = 60 + 1;
  • 60 ÷ 2 = 30 + 0;
  • 30 ÷ 2 = 15 + 0;
  • 15 ÷ 2 = 7 + 1;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


971(10) =


011 1100 1011(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 =


0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1100 1011


Mantissa (52 bits) =
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000


The base ten decimal number 0.000 000 000 000 000 222 044 604 925 031 308 084 726 36 converted and written in 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1100 1011 - 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100