2 261 281 541 941 771 951 627 248 194 177 194 178 571 941 844 891 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 2 261 281 541 941 771 951 627 248 194 177 194 178 571 941 844 891(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
2 261 281 541 941 771 951 627 248 194 177 194 178 571 941 844 891(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 2 261 281 541 941 771 951 627 248 194 177 194 178 571 941 844 891 ÷ 2 = 1 130 640 770 970 885 975 813 624 097 088 597 089 285 970 922 445 + 1;
  • 1 130 640 770 970 885 975 813 624 097 088 597 089 285 970 922 445 ÷ 2 = 565 320 385 485 442 987 906 812 048 544 298 544 642 985 461 222 + 1;
  • 565 320 385 485 442 987 906 812 048 544 298 544 642 985 461 222 ÷ 2 = 282 660 192 742 721 493 953 406 024 272 149 272 321 492 730 611 + 0;
  • 282 660 192 742 721 493 953 406 024 272 149 272 321 492 730 611 ÷ 2 = 141 330 096 371 360 746 976 703 012 136 074 636 160 746 365 305 + 1;
  • 141 330 096 371 360 746 976 703 012 136 074 636 160 746 365 305 ÷ 2 = 70 665 048 185 680 373 488 351 506 068 037 318 080 373 182 652 + 1;
  • 70 665 048 185 680 373 488 351 506 068 037 318 080 373 182 652 ÷ 2 = 35 332 524 092 840 186 744 175 753 034 018 659 040 186 591 326 + 0;
  • 35 332 524 092 840 186 744 175 753 034 018 659 040 186 591 326 ÷ 2 = 17 666 262 046 420 093 372 087 876 517 009 329 520 093 295 663 + 0;
  • 17 666 262 046 420 093 372 087 876 517 009 329 520 093 295 663 ÷ 2 = 8 833 131 023 210 046 686 043 938 258 504 664 760 046 647 831 + 1;
  • 8 833 131 023 210 046 686 043 938 258 504 664 760 046 647 831 ÷ 2 = 4 416 565 511 605 023 343 021 969 129 252 332 380 023 323 915 + 1;
  • 4 416 565 511 605 023 343 021 969 129 252 332 380 023 323 915 ÷ 2 = 2 208 282 755 802 511 671 510 984 564 626 166 190 011 661 957 + 1;
  • 2 208 282 755 802 511 671 510 984 564 626 166 190 011 661 957 ÷ 2 = 1 104 141 377 901 255 835 755 492 282 313 083 095 005 830 978 + 1;
  • 1 104 141 377 901 255 835 755 492 282 313 083 095 005 830 978 ÷ 2 = 552 070 688 950 627 917 877 746 141 156 541 547 502 915 489 + 0;
  • 552 070 688 950 627 917 877 746 141 156 541 547 502 915 489 ÷ 2 = 276 035 344 475 313 958 938 873 070 578 270 773 751 457 744 + 1;
  • 276 035 344 475 313 958 938 873 070 578 270 773 751 457 744 ÷ 2 = 138 017 672 237 656 979 469 436 535 289 135 386 875 728 872 + 0;
  • 138 017 672 237 656 979 469 436 535 289 135 386 875 728 872 ÷ 2 = 69 008 836 118 828 489 734 718 267 644 567 693 437 864 436 + 0;
  • 69 008 836 118 828 489 734 718 267 644 567 693 437 864 436 ÷ 2 = 34 504 418 059 414 244 867 359 133 822 283 846 718 932 218 + 0;
  • 34 504 418 059 414 244 867 359 133 822 283 846 718 932 218 ÷ 2 = 17 252 209 029 707 122 433 679 566 911 141 923 359 466 109 + 0;
  • 17 252 209 029 707 122 433 679 566 911 141 923 359 466 109 ÷ 2 = 8 626 104 514 853 561 216 839 783 455 570 961 679 733 054 + 1;
  • 8 626 104 514 853 561 216 839 783 455 570 961 679 733 054 ÷ 2 = 4 313 052 257 426 780 608 419 891 727 785 480 839 866 527 + 0;
  • 4 313 052 257 426 780 608 419 891 727 785 480 839 866 527 ÷ 2 = 2 156 526 128 713 390 304 209 945 863 892 740 419 933 263 + 1;
  • 2 156 526 128 713 390 304 209 945 863 892 740 419 933 263 ÷ 2 = 1 078 263 064 356 695 152 104 972 931 946 370 209 966 631 + 1;
  • 1 078 263 064 356 695 152 104 972 931 946 370 209 966 631 ÷ 2 = 539 131 532 178 347 576 052 486 465 973 185 104 983 315 + 1;
  • 539 131 532 178 347 576 052 486 465 973 185 104 983 315 ÷ 2 = 269 565 766 089 173 788 026 243 232 986 592 552 491 657 + 1;
  • 269 565 766 089 173 788 026 243 232 986 592 552 491 657 ÷ 2 = 134 782 883 044 586 894 013 121 616 493 296 276 245 828 + 1;
  • 134 782 883 044 586 894 013 121 616 493 296 276 245 828 ÷ 2 = 67 391 441 522 293 447 006 560 808 246 648 138 122 914 + 0;
  • 67 391 441 522 293 447 006 560 808 246 648 138 122 914 ÷ 2 = 33 695 720 761 146 723 503 280 404 123 324 069 061 457 + 0;
  • 33 695 720 761 146 723 503 280 404 123 324 069 061 457 ÷ 2 = 16 847 860 380 573 361 751 640 202 061 662 034 530 728 + 1;
  • 16 847 860 380 573 361 751 640 202 061 662 034 530 728 ÷ 2 = 8 423 930 190 286 680 875 820 101 030 831 017 265 364 + 0;
  • 8 423 930 190 286 680 875 820 101 030 831 017 265 364 ÷ 2 = 4 211 965 095 143 340 437 910 050 515 415 508 632 682 + 0;
  • 4 211 965 095 143 340 437 910 050 515 415 508 632 682 ÷ 2 = 2 105 982 547 571 670 218 955 025 257 707 754 316 341 + 0;
  • 2 105 982 547 571 670 218 955 025 257 707 754 316 341 ÷ 2 = 1 052 991 273 785 835 109 477 512 628 853 877 158 170 + 1;
  • 1 052 991 273 785 835 109 477 512 628 853 877 158 170 ÷ 2 = 526 495 636 892 917 554 738 756 314 426 938 579 085 + 0;
  • 526 495 636 892 917 554 738 756 314 426 938 579 085 ÷ 2 = 263 247 818 446 458 777 369 378 157 213 469 289 542 + 1;
  • 263 247 818 446 458 777 369 378 157 213 469 289 542 ÷ 2 = 131 623 909 223 229 388 684 689 078 606 734 644 771 + 0;
  • 131 623 909 223 229 388 684 689 078 606 734 644 771 ÷ 2 = 65 811 954 611 614 694 342 344 539 303 367 322 385 + 1;
  • 65 811 954 611 614 694 342 344 539 303 367 322 385 ÷ 2 = 32 905 977 305 807 347 171 172 269 651 683 661 192 + 1;
  • 32 905 977 305 807 347 171 172 269 651 683 661 192 ÷ 2 = 16 452 988 652 903 673 585 586 134 825 841 830 596 + 0;
  • 16 452 988 652 903 673 585 586 134 825 841 830 596 ÷ 2 = 8 226 494 326 451 836 792 793 067 412 920 915 298 + 0;
  • 8 226 494 326 451 836 792 793 067 412 920 915 298 ÷ 2 = 4 113 247 163 225 918 396 396 533 706 460 457 649 + 0;
  • 4 113 247 163 225 918 396 396 533 706 460 457 649 ÷ 2 = 2 056 623 581 612 959 198 198 266 853 230 228 824 + 1;
  • 2 056 623 581 612 959 198 198 266 853 230 228 824 ÷ 2 = 1 028 311 790 806 479 599 099 133 426 615 114 412 + 0;
  • 1 028 311 790 806 479 599 099 133 426 615 114 412 ÷ 2 = 514 155 895 403 239 799 549 566 713 307 557 206 + 0;
  • 514 155 895 403 239 799 549 566 713 307 557 206 ÷ 2 = 257 077 947 701 619 899 774 783 356 653 778 603 + 0;
  • 257 077 947 701 619 899 774 783 356 653 778 603 ÷ 2 = 128 538 973 850 809 949 887 391 678 326 889 301 + 1;
  • 128 538 973 850 809 949 887 391 678 326 889 301 ÷ 2 = 64 269 486 925 404 974 943 695 839 163 444 650 + 1;
  • 64 269 486 925 404 974 943 695 839 163 444 650 ÷ 2 = 32 134 743 462 702 487 471 847 919 581 722 325 + 0;
  • 32 134 743 462 702 487 471 847 919 581 722 325 ÷ 2 = 16 067 371 731 351 243 735 923 959 790 861 162 + 1;
  • 16 067 371 731 351 243 735 923 959 790 861 162 ÷ 2 = 8 033 685 865 675 621 867 961 979 895 430 581 + 0;
  • 8 033 685 865 675 621 867 961 979 895 430 581 ÷ 2 = 4 016 842 932 837 810 933 980 989 947 715 290 + 1;
  • 4 016 842 932 837 810 933 980 989 947 715 290 ÷ 2 = 2 008 421 466 418 905 466 990 494 973 857 645 + 0;
  • 2 008 421 466 418 905 466 990 494 973 857 645 ÷ 2 = 1 004 210 733 209 452 733 495 247 486 928 822 + 1;
  • 1 004 210 733 209 452 733 495 247 486 928 822 ÷ 2 = 502 105 366 604 726 366 747 623 743 464 411 + 0;
  • 502 105 366 604 726 366 747 623 743 464 411 ÷ 2 = 251 052 683 302 363 183 373 811 871 732 205 + 1;
  • 251 052 683 302 363 183 373 811 871 732 205 ÷ 2 = 125 526 341 651 181 591 686 905 935 866 102 + 1;
  • 125 526 341 651 181 591 686 905 935 866 102 ÷ 2 = 62 763 170 825 590 795 843 452 967 933 051 + 0;
  • 62 763 170 825 590 795 843 452 967 933 051 ÷ 2 = 31 381 585 412 795 397 921 726 483 966 525 + 1;
  • 31 381 585 412 795 397 921 726 483 966 525 ÷ 2 = 15 690 792 706 397 698 960 863 241 983 262 + 1;
  • 15 690 792 706 397 698 960 863 241 983 262 ÷ 2 = 7 845 396 353 198 849 480 431 620 991 631 + 0;
  • 7 845 396 353 198 849 480 431 620 991 631 ÷ 2 = 3 922 698 176 599 424 740 215 810 495 815 + 1;
  • 3 922 698 176 599 424 740 215 810 495 815 ÷ 2 = 1 961 349 088 299 712 370 107 905 247 907 + 1;
  • 1 961 349 088 299 712 370 107 905 247 907 ÷ 2 = 980 674 544 149 856 185 053 952 623 953 + 1;
  • 980 674 544 149 856 185 053 952 623 953 ÷ 2 = 490 337 272 074 928 092 526 976 311 976 + 1;
  • 490 337 272 074 928 092 526 976 311 976 ÷ 2 = 245 168 636 037 464 046 263 488 155 988 + 0;
  • 245 168 636 037 464 046 263 488 155 988 ÷ 2 = 122 584 318 018 732 023 131 744 077 994 + 0;
  • 122 584 318 018 732 023 131 744 077 994 ÷ 2 = 61 292 159 009 366 011 565 872 038 997 + 0;
  • 61 292 159 009 366 011 565 872 038 997 ÷ 2 = 30 646 079 504 683 005 782 936 019 498 + 1;
  • 30 646 079 504 683 005 782 936 019 498 ÷ 2 = 15 323 039 752 341 502 891 468 009 749 + 0;
  • 15 323 039 752 341 502 891 468 009 749 ÷ 2 = 7 661 519 876 170 751 445 734 004 874 + 1;
  • 7 661 519 876 170 751 445 734 004 874 ÷ 2 = 3 830 759 938 085 375 722 867 002 437 + 0;
  • 3 830 759 938 085 375 722 867 002 437 ÷ 2 = 1 915 379 969 042 687 861 433 501 218 + 1;
  • 1 915 379 969 042 687 861 433 501 218 ÷ 2 = 957 689 984 521 343 930 716 750 609 + 0;
  • 957 689 984 521 343 930 716 750 609 ÷ 2 = 478 844 992 260 671 965 358 375 304 + 1;
  • 478 844 992 260 671 965 358 375 304 ÷ 2 = 239 422 496 130 335 982 679 187 652 + 0;
  • 239 422 496 130 335 982 679 187 652 ÷ 2 = 119 711 248 065 167 991 339 593 826 + 0;
  • 119 711 248 065 167 991 339 593 826 ÷ 2 = 59 855 624 032 583 995 669 796 913 + 0;
  • 59 855 624 032 583 995 669 796 913 ÷ 2 = 29 927 812 016 291 997 834 898 456 + 1;
  • 29 927 812 016 291 997 834 898 456 ÷ 2 = 14 963 906 008 145 998 917 449 228 + 0;
  • 14 963 906 008 145 998 917 449 228 ÷ 2 = 7 481 953 004 072 999 458 724 614 + 0;
  • 7 481 953 004 072 999 458 724 614 ÷ 2 = 3 740 976 502 036 499 729 362 307 + 0;
  • 3 740 976 502 036 499 729 362 307 ÷ 2 = 1 870 488 251 018 249 864 681 153 + 1;
  • 1 870 488 251 018 249 864 681 153 ÷ 2 = 935 244 125 509 124 932 340 576 + 1;
  • 935 244 125 509 124 932 340 576 ÷ 2 = 467 622 062 754 562 466 170 288 + 0;
  • 467 622 062 754 562 466 170 288 ÷ 2 = 233 811 031 377 281 233 085 144 + 0;
  • 233 811 031 377 281 233 085 144 ÷ 2 = 116 905 515 688 640 616 542 572 + 0;
  • 116 905 515 688 640 616 542 572 ÷ 2 = 58 452 757 844 320 308 271 286 + 0;
  • 58 452 757 844 320 308 271 286 ÷ 2 = 29 226 378 922 160 154 135 643 + 0;
  • 29 226 378 922 160 154 135 643 ÷ 2 = 14 613 189 461 080 077 067 821 + 1;
  • 14 613 189 461 080 077 067 821 ÷ 2 = 7 306 594 730 540 038 533 910 + 1;
  • 7 306 594 730 540 038 533 910 ÷ 2 = 3 653 297 365 270 019 266 955 + 0;
  • 3 653 297 365 270 019 266 955 ÷ 2 = 1 826 648 682 635 009 633 477 + 1;
  • 1 826 648 682 635 009 633 477 ÷ 2 = 913 324 341 317 504 816 738 + 1;
  • 913 324 341 317 504 816 738 ÷ 2 = 456 662 170 658 752 408 369 + 0;
  • 456 662 170 658 752 408 369 ÷ 2 = 228 331 085 329 376 204 184 + 1;
  • 228 331 085 329 376 204 184 ÷ 2 = 114 165 542 664 688 102 092 + 0;
  • 114 165 542 664 688 102 092 ÷ 2 = 57 082 771 332 344 051 046 + 0;
  • 57 082 771 332 344 051 046 ÷ 2 = 28 541 385 666 172 025 523 + 0;
  • 28 541 385 666 172 025 523 ÷ 2 = 14 270 692 833 086 012 761 + 1;
  • 14 270 692 833 086 012 761 ÷ 2 = 7 135 346 416 543 006 380 + 1;
  • 7 135 346 416 543 006 380 ÷ 2 = 3 567 673 208 271 503 190 + 0;
  • 3 567 673 208 271 503 190 ÷ 2 = 1 783 836 604 135 751 595 + 0;
  • 1 783 836 604 135 751 595 ÷ 2 = 891 918 302 067 875 797 + 1;
  • 891 918 302 067 875 797 ÷ 2 = 445 959 151 033 937 898 + 1;
  • 445 959 151 033 937 898 ÷ 2 = 222 979 575 516 968 949 + 0;
  • 222 979 575 516 968 949 ÷ 2 = 111 489 787 758 484 474 + 1;
  • 111 489 787 758 484 474 ÷ 2 = 55 744 893 879 242 237 + 0;
  • 55 744 893 879 242 237 ÷ 2 = 27 872 446 939 621 118 + 1;
  • 27 872 446 939 621 118 ÷ 2 = 13 936 223 469 810 559 + 0;
  • 13 936 223 469 810 559 ÷ 2 = 6 968 111 734 905 279 + 1;
  • 6 968 111 734 905 279 ÷ 2 = 3 484 055 867 452 639 + 1;
  • 3 484 055 867 452 639 ÷ 2 = 1 742 027 933 726 319 + 1;
  • 1 742 027 933 726 319 ÷ 2 = 871 013 966 863 159 + 1;
  • 871 013 966 863 159 ÷ 2 = 435 506 983 431 579 + 1;
  • 435 506 983 431 579 ÷ 2 = 217 753 491 715 789 + 1;
  • 217 753 491 715 789 ÷ 2 = 108 876 745 857 894 + 1;
  • 108 876 745 857 894 ÷ 2 = 54 438 372 928 947 + 0;
  • 54 438 372 928 947 ÷ 2 = 27 219 186 464 473 + 1;
  • 27 219 186 464 473 ÷ 2 = 13 609 593 232 236 + 1;
  • 13 609 593 232 236 ÷ 2 = 6 804 796 616 118 + 0;
  • 6 804 796 616 118 ÷ 2 = 3 402 398 308 059 + 0;
  • 3 402 398 308 059 ÷ 2 = 1 701 199 154 029 + 1;
  • 1 701 199 154 029 ÷ 2 = 850 599 577 014 + 1;
  • 850 599 577 014 ÷ 2 = 425 299 788 507 + 0;
  • 425 299 788 507 ÷ 2 = 212 649 894 253 + 1;
  • 212 649 894 253 ÷ 2 = 106 324 947 126 + 1;
  • 106 324 947 126 ÷ 2 = 53 162 473 563 + 0;
  • 53 162 473 563 ÷ 2 = 26 581 236 781 + 1;
  • 26 581 236 781 ÷ 2 = 13 290 618 390 + 1;
  • 13 290 618 390 ÷ 2 = 6 645 309 195 + 0;
  • 6 645 309 195 ÷ 2 = 3 322 654 597 + 1;
  • 3 322 654 597 ÷ 2 = 1 661 327 298 + 1;
  • 1 661 327 298 ÷ 2 = 830 663 649 + 0;
  • 830 663 649 ÷ 2 = 415 331 824 + 1;
  • 415 331 824 ÷ 2 = 207 665 912 + 0;
  • 207 665 912 ÷ 2 = 103 832 956 + 0;
  • 103 832 956 ÷ 2 = 51 916 478 + 0;
  • 51 916 478 ÷ 2 = 25 958 239 + 0;
  • 25 958 239 ÷ 2 = 12 979 119 + 1;
  • 12 979 119 ÷ 2 = 6 489 559 + 1;
  • 6 489 559 ÷ 2 = 3 244 779 + 1;
  • 3 244 779 ÷ 2 = 1 622 389 + 1;
  • 1 622 389 ÷ 2 = 811 194 + 1;
  • 811 194 ÷ 2 = 405 597 + 0;
  • 405 597 ÷ 2 = 202 798 + 1;
  • 202 798 ÷ 2 = 101 399 + 0;
  • 101 399 ÷ 2 = 50 699 + 1;
  • 50 699 ÷ 2 = 25 349 + 1;
  • 25 349 ÷ 2 = 12 674 + 1;
  • 12 674 ÷ 2 = 6 337 + 0;
  • 6 337 ÷ 2 = 3 168 + 1;
  • 3 168 ÷ 2 = 1 584 + 0;
  • 1 584 ÷ 2 = 792 + 0;
  • 792 ÷ 2 = 396 + 0;
  • 396 ÷ 2 = 198 + 0;
  • 198 ÷ 2 = 99 + 0;
  • 99 ÷ 2 = 49 + 1;
  • 49 ÷ 2 = 24 + 1;
  • 24 ÷ 2 = 12 + 0;
  • 12 ÷ 2 = 6 + 0;
  • 6 ÷ 2 = 3 + 0;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

2. Construct the base 2 representation of the positive number.

Take all the remainders starting from the bottom of the list constructed above.

2 261 281 541 941 771 951 627 248 194 177 194 178 571 941 844 891(10) =


1 1000 1100 0001 0111 0101 1111 0000 1011 0110 1101 1001 1011 1111 1010 1011 0011 0001 0110 1100 0001 1000 1000 1010 1010 0011 1101 1011 0101 0101 1000 1000 1101 0100 0100 1111 1010 0001 0111 1001 1011(2)


3. Normalize the binary representation of the number.

Shift the decimal mark 160 positions to the left, so that only one non zero digit remains to the left of it:


2 261 281 541 941 771 951 627 248 194 177 194 178 571 941 844 891(10) =


1 1000 1100 0001 0111 0101 1111 0000 1011 0110 1101 1001 1011 1111 1010 1011 0011 0001 0110 1100 0001 1000 1000 1010 1010 0011 1101 1011 0101 0101 1000 1000 1101 0100 0100 1111 1010 0001 0111 1001 1011(2) =


1 1000 1100 0001 0111 0101 1111 0000 1011 0110 1101 1001 1011 1111 1010 1011 0011 0001 0110 1100 0001 1000 1000 1010 1010 0011 1101 1011 0101 0101 1000 1000 1101 0100 0100 1111 1010 0001 0111 1001 1011(2) × 20 =


1.1000 1100 0001 0111 0101 1111 0000 1011 0110 1101 1001 1011 1111 1010 1011 0011 0001 0110 1100 0001 1000 1000 1010 1010 0011 1101 1011 0101 0101 1000 1000 1101 0100 0100 1111 1010 0001 0111 1001 1011(2) × 2160


4. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): 160


Mantissa (not normalized):
1.1000 1100 0001 0111 0101 1111 0000 1011 0110 1101 1001 1011 1111 1010 1011 0011 0001 0110 1100 0001 1000 1000 1010 1010 0011 1101 1011 0101 0101 1000 1000 1101 0100 0100 1111 1010 0001 0111 1001 1011


5. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


160 + 2(11-1) - 1 =


(160 + 1 023)(10) =


1 183(10)


6. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 1 183 ÷ 2 = 591 + 1;
  • 591 ÷ 2 = 295 + 1;
  • 295 ÷ 2 = 147 + 1;
  • 147 ÷ 2 = 73 + 1;
  • 73 ÷ 2 = 36 + 1;
  • 36 ÷ 2 = 18 + 0;
  • 18 ÷ 2 = 9 + 0;
  • 9 ÷ 2 = 4 + 1;
  • 4 ÷ 2 = 2 + 0;
  • 2 ÷ 2 = 1 + 0;
  • 1 ÷ 2 = 0 + 1;

7. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


1183(10) =


100 1001 1111(2)


8. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, by removing the excess bits, from the right (if any of the excess bits is set on 1, we are losing precision...).


Mantissa (normalized) =


1. 1000 1100 0001 0111 0101 1111 0000 1011 0110 1101 1001 1011 1111 1010 1011 0011 0001 0110 1100 0001 1000 1000 1010 1010 0011 1101 1011 0101 0101 1000 1000 1101 0100 0100 1111 1010 0001 0111 1001 1011 =


1000 1100 0001 0111 0101 1111 0000 1011 0110 1101 1001 1011 1111


9. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
100 1001 1111


Mantissa (52 bits) =
1000 1100 0001 0111 0101 1111 0000 1011 0110 1101 1001 1011 1111


Decimal number 2 261 281 541 941 771 951 627 248 194 177 194 178 571 941 844 891 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 100 1001 1111 - 1000 1100 0001 0111 0101 1111 0000 1011 0110 1101 1001 1011 1111


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100