Base ten decimal number 1 001 111 011 011 110 111 010 010 111 011 110 101 110 000 101 000 111 converted to 64 bit double precision IEEE 754 binary floating point standard

How to convert the decimal number 1 001 111 011 011 110 111 010 010 111 011 110 101 110 000 101 000 111(10)
to
64 bit double precision IEEE 754 binary floating point
(1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. Divide the number repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:

  • division = quotient + remainder;
  • 1 001 111 011 011 110 111 010 010 111 011 110 101 110 000 101 000 111 ÷ 2 = 500 555 505 505 555 055 505 005 055 505 555 050 555 000 050 500 055 + 1;
  • 500 555 505 505 555 055 505 005 055 505 555 050 555 000 050 500 055 ÷ 2 = 250 277 752 752 777 527 752 502 527 752 777 525 277 500 025 250 027 + 1;
  • 250 277 752 752 777 527 752 502 527 752 777 525 277 500 025 250 027 ÷ 2 = 125 138 876 376 388 763 876 251 263 876 388 762 638 750 012 625 013 + 1;
  • 125 138 876 376 388 763 876 251 263 876 388 762 638 750 012 625 013 ÷ 2 = 62 569 438 188 194 381 938 125 631 938 194 381 319 375 006 312 506 + 1;
  • 62 569 438 188 194 381 938 125 631 938 194 381 319 375 006 312 506 ÷ 2 = 31 284 719 094 097 190 969 062 815 969 097 190 659 687 503 156 253 + 0;
  • 31 284 719 094 097 190 969 062 815 969 097 190 659 687 503 156 253 ÷ 2 = 15 642 359 547 048 595 484 531 407 984 548 595 329 843 751 578 126 + 1;
  • 15 642 359 547 048 595 484 531 407 984 548 595 329 843 751 578 126 ÷ 2 = 7 821 179 773 524 297 742 265 703 992 274 297 664 921 875 789 063 + 0;
  • 7 821 179 773 524 297 742 265 703 992 274 297 664 921 875 789 063 ÷ 2 = 3 910 589 886 762 148 871 132 851 996 137 148 832 460 937 894 531 + 1;
  • 3 910 589 886 762 148 871 132 851 996 137 148 832 460 937 894 531 ÷ 2 = 1 955 294 943 381 074 435 566 425 998 068 574 416 230 468 947 265 + 1;
  • 1 955 294 943 381 074 435 566 425 998 068 574 416 230 468 947 265 ÷ 2 = 977 647 471 690 537 217 783 212 999 034 287 208 115 234 473 632 + 1;
  • 977 647 471 690 537 217 783 212 999 034 287 208 115 234 473 632 ÷ 2 = 488 823 735 845 268 608 891 606 499 517 143 604 057 617 236 816 + 0;
  • 488 823 735 845 268 608 891 606 499 517 143 604 057 617 236 816 ÷ 2 = 244 411 867 922 634 304 445 803 249 758 571 802 028 808 618 408 + 0;
  • 244 411 867 922 634 304 445 803 249 758 571 802 028 808 618 408 ÷ 2 = 122 205 933 961 317 152 222 901 624 879 285 901 014 404 309 204 + 0;
  • 122 205 933 961 317 152 222 901 624 879 285 901 014 404 309 204 ÷ 2 = 61 102 966 980 658 576 111 450 812 439 642 950 507 202 154 602 + 0;
  • 61 102 966 980 658 576 111 450 812 439 642 950 507 202 154 602 ÷ 2 = 30 551 483 490 329 288 055 725 406 219 821 475 253 601 077 301 + 0;
  • 30 551 483 490 329 288 055 725 406 219 821 475 253 601 077 301 ÷ 2 = 15 275 741 745 164 644 027 862 703 109 910 737 626 800 538 650 + 1;
  • 15 275 741 745 164 644 027 862 703 109 910 737 626 800 538 650 ÷ 2 = 7 637 870 872 582 322 013 931 351 554 955 368 813 400 269 325 + 0;
  • 7 637 870 872 582 322 013 931 351 554 955 368 813 400 269 325 ÷ 2 = 3 818 935 436 291 161 006 965 675 777 477 684 406 700 134 662 + 1;
  • 3 818 935 436 291 161 006 965 675 777 477 684 406 700 134 662 ÷ 2 = 1 909 467 718 145 580 503 482 837 888 738 842 203 350 067 331 + 0;
  • 1 909 467 718 145 580 503 482 837 888 738 842 203 350 067 331 ÷ 2 = 954 733 859 072 790 251 741 418 944 369 421 101 675 033 665 + 1;
  • 954 733 859 072 790 251 741 418 944 369 421 101 675 033 665 ÷ 2 = 477 366 929 536 395 125 870 709 472 184 710 550 837 516 832 + 1;
  • 477 366 929 536 395 125 870 709 472 184 710 550 837 516 832 ÷ 2 = 238 683 464 768 197 562 935 354 736 092 355 275 418 758 416 + 0;
  • 238 683 464 768 197 562 935 354 736 092 355 275 418 758 416 ÷ 2 = 119 341 732 384 098 781 467 677 368 046 177 637 709 379 208 + 0;
  • 119 341 732 384 098 781 467 677 368 046 177 637 709 379 208 ÷ 2 = 59 670 866 192 049 390 733 838 684 023 088 818 854 689 604 + 0;
  • 59 670 866 192 049 390 733 838 684 023 088 818 854 689 604 ÷ 2 = 29 835 433 096 024 695 366 919 342 011 544 409 427 344 802 + 0;
  • 29 835 433 096 024 695 366 919 342 011 544 409 427 344 802 ÷ 2 = 14 917 716 548 012 347 683 459 671 005 772 204 713 672 401 + 0;
  • 14 917 716 548 012 347 683 459 671 005 772 204 713 672 401 ÷ 2 = 7 458 858 274 006 173 841 729 835 502 886 102 356 836 200 + 1;
  • 7 458 858 274 006 173 841 729 835 502 886 102 356 836 200 ÷ 2 = 3 729 429 137 003 086 920 864 917 751 443 051 178 418 100 + 0;
  • 3 729 429 137 003 086 920 864 917 751 443 051 178 418 100 ÷ 2 = 1 864 714 568 501 543 460 432 458 875 721 525 589 209 050 + 0;
  • 1 864 714 568 501 543 460 432 458 875 721 525 589 209 050 ÷ 2 = 932 357 284 250 771 730 216 229 437 860 762 794 604 525 + 0;
  • 932 357 284 250 771 730 216 229 437 860 762 794 604 525 ÷ 2 = 466 178 642 125 385 865 108 114 718 930 381 397 302 262 + 1;
  • 466 178 642 125 385 865 108 114 718 930 381 397 302 262 ÷ 2 = 233 089 321 062 692 932 554 057 359 465 190 698 651 131 + 0;
  • 233 089 321 062 692 932 554 057 359 465 190 698 651 131 ÷ 2 = 116 544 660 531 346 466 277 028 679 732 595 349 325 565 + 1;
  • 116 544 660 531 346 466 277 028 679 732 595 349 325 565 ÷ 2 = 58 272 330 265 673 233 138 514 339 866 297 674 662 782 + 1;
  • 58 272 330 265 673 233 138 514 339 866 297 674 662 782 ÷ 2 = 29 136 165 132 836 616 569 257 169 933 148 837 331 391 + 0;
  • 29 136 165 132 836 616 569 257 169 933 148 837 331 391 ÷ 2 = 14 568 082 566 418 308 284 628 584 966 574 418 665 695 + 1;
  • 14 568 082 566 418 308 284 628 584 966 574 418 665 695 ÷ 2 = 7 284 041 283 209 154 142 314 292 483 287 209 332 847 + 1;
  • 7 284 041 283 209 154 142 314 292 483 287 209 332 847 ÷ 2 = 3 642 020 641 604 577 071 157 146 241 643 604 666 423 + 1;
  • 3 642 020 641 604 577 071 157 146 241 643 604 666 423 ÷ 2 = 1 821 010 320 802 288 535 578 573 120 821 802 333 211 + 1;
  • 1 821 010 320 802 288 535 578 573 120 821 802 333 211 ÷ 2 = 910 505 160 401 144 267 789 286 560 410 901 166 605 + 1;
  • 910 505 160 401 144 267 789 286 560 410 901 166 605 ÷ 2 = 455 252 580 200 572 133 894 643 280 205 450 583 302 + 1;
  • 455 252 580 200 572 133 894 643 280 205 450 583 302 ÷ 2 = 227 626 290 100 286 066 947 321 640 102 725 291 651 + 0;
  • 227 626 290 100 286 066 947 321 640 102 725 291 651 ÷ 2 = 113 813 145 050 143 033 473 660 820 051 362 645 825 + 1;
  • 113 813 145 050 143 033 473 660 820 051 362 645 825 ÷ 2 = 56 906 572 525 071 516 736 830 410 025 681 322 912 + 1;
  • 56 906 572 525 071 516 736 830 410 025 681 322 912 ÷ 2 = 28 453 286 262 535 758 368 415 205 012 840 661 456 + 0;
  • 28 453 286 262 535 758 368 415 205 012 840 661 456 ÷ 2 = 14 226 643 131 267 879 184 207 602 506 420 330 728 + 0;
  • 14 226 643 131 267 879 184 207 602 506 420 330 728 ÷ 2 = 7 113 321 565 633 939 592 103 801 253 210 165 364 + 0;
  • 7 113 321 565 633 939 592 103 801 253 210 165 364 ÷ 2 = 3 556 660 782 816 969 796 051 900 626 605 082 682 + 0;
  • 3 556 660 782 816 969 796 051 900 626 605 082 682 ÷ 2 = 1 778 330 391 408 484 898 025 950 313 302 541 341 + 0;
  • 1 778 330 391 408 484 898 025 950 313 302 541 341 ÷ 2 = 889 165 195 704 242 449 012 975 156 651 270 670 + 1;
  • 889 165 195 704 242 449 012 975 156 651 270 670 ÷ 2 = 444 582 597 852 121 224 506 487 578 325 635 335 + 0;
  • 444 582 597 852 121 224 506 487 578 325 635 335 ÷ 2 = 222 291 298 926 060 612 253 243 789 162 817 667 + 1;
  • 222 291 298 926 060 612 253 243 789 162 817 667 ÷ 2 = 111 145 649 463 030 306 126 621 894 581 408 833 + 1;
  • 111 145 649 463 030 306 126 621 894 581 408 833 ÷ 2 = 55 572 824 731 515 153 063 310 947 290 704 416 + 1;
  • 55 572 824 731 515 153 063 310 947 290 704 416 ÷ 2 = 27 786 412 365 757 576 531 655 473 645 352 208 + 0;
  • 27 786 412 365 757 576 531 655 473 645 352 208 ÷ 2 = 13 893 206 182 878 788 265 827 736 822 676 104 + 0;
  • 13 893 206 182 878 788 265 827 736 822 676 104 ÷ 2 = 6 946 603 091 439 394 132 913 868 411 338 052 + 0;
  • 6 946 603 091 439 394 132 913 868 411 338 052 ÷ 2 = 3 473 301 545 719 697 066 456 934 205 669 026 + 0;
  • 3 473 301 545 719 697 066 456 934 205 669 026 ÷ 2 = 1 736 650 772 859 848 533 228 467 102 834 513 + 0;
  • 1 736 650 772 859 848 533 228 467 102 834 513 ÷ 2 = 868 325 386 429 924 266 614 233 551 417 256 + 1;
  • 868 325 386 429 924 266 614 233 551 417 256 ÷ 2 = 434 162 693 214 962 133 307 116 775 708 628 + 0;
  • 434 162 693 214 962 133 307 116 775 708 628 ÷ 2 = 217 081 346 607 481 066 653 558 387 854 314 + 0;
  • 217 081 346 607 481 066 653 558 387 854 314 ÷ 2 = 108 540 673 303 740 533 326 779 193 927 157 + 0;
  • 108 540 673 303 740 533 326 779 193 927 157 ÷ 2 = 54 270 336 651 870 266 663 389 596 963 578 + 1;
  • 54 270 336 651 870 266 663 389 596 963 578 ÷ 2 = 27 135 168 325 935 133 331 694 798 481 789 + 0;
  • 27 135 168 325 935 133 331 694 798 481 789 ÷ 2 = 13 567 584 162 967 566 665 847 399 240 894 + 1;
  • 13 567 584 162 967 566 665 847 399 240 894 ÷ 2 = 6 783 792 081 483 783 332 923 699 620 447 + 0;
  • 6 783 792 081 483 783 332 923 699 620 447 ÷ 2 = 3 391 896 040 741 891 666 461 849 810 223 + 1;
  • 3 391 896 040 741 891 666 461 849 810 223 ÷ 2 = 1 695 948 020 370 945 833 230 924 905 111 + 1;
  • 1 695 948 020 370 945 833 230 924 905 111 ÷ 2 = 847 974 010 185 472 916 615 462 452 555 + 1;
  • 847 974 010 185 472 916 615 462 452 555 ÷ 2 = 423 987 005 092 736 458 307 731 226 277 + 1;
  • 423 987 005 092 736 458 307 731 226 277 ÷ 2 = 211 993 502 546 368 229 153 865 613 138 + 1;
  • 211 993 502 546 368 229 153 865 613 138 ÷ 2 = 105 996 751 273 184 114 576 932 806 569 + 0;
  • 105 996 751 273 184 114 576 932 806 569 ÷ 2 = 52 998 375 636 592 057 288 466 403 284 + 1;
  • 52 998 375 636 592 057 288 466 403 284 ÷ 2 = 26 499 187 818 296 028 644 233 201 642 + 0;
  • 26 499 187 818 296 028 644 233 201 642 ÷ 2 = 13 249 593 909 148 014 322 116 600 821 + 0;
  • 13 249 593 909 148 014 322 116 600 821 ÷ 2 = 6 624 796 954 574 007 161 058 300 410 + 1;
  • 6 624 796 954 574 007 161 058 300 410 ÷ 2 = 3 312 398 477 287 003 580 529 150 205 + 0;
  • 3 312 398 477 287 003 580 529 150 205 ÷ 2 = 1 656 199 238 643 501 790 264 575 102 + 1;
  • 1 656 199 238 643 501 790 264 575 102 ÷ 2 = 828 099 619 321 750 895 132 287 551 + 0;
  • 828 099 619 321 750 895 132 287 551 ÷ 2 = 414 049 809 660 875 447 566 143 775 + 1;
  • 414 049 809 660 875 447 566 143 775 ÷ 2 = 207 024 904 830 437 723 783 071 887 + 1;
  • 207 024 904 830 437 723 783 071 887 ÷ 2 = 103 512 452 415 218 861 891 535 943 + 1;
  • 103 512 452 415 218 861 891 535 943 ÷ 2 = 51 756 226 207 609 430 945 767 971 + 1;
  • 51 756 226 207 609 430 945 767 971 ÷ 2 = 25 878 113 103 804 715 472 883 985 + 1;
  • 25 878 113 103 804 715 472 883 985 ÷ 2 = 12 939 056 551 902 357 736 441 992 + 1;
  • 12 939 056 551 902 357 736 441 992 ÷ 2 = 6 469 528 275 951 178 868 220 996 + 0;
  • 6 469 528 275 951 178 868 220 996 ÷ 2 = 3 234 764 137 975 589 434 110 498 + 0;
  • 3 234 764 137 975 589 434 110 498 ÷ 2 = 1 617 382 068 987 794 717 055 249 + 0;
  • 1 617 382 068 987 794 717 055 249 ÷ 2 = 808 691 034 493 897 358 527 624 + 1;
  • 808 691 034 493 897 358 527 624 ÷ 2 = 404 345 517 246 948 679 263 812 + 0;
  • 404 345 517 246 948 679 263 812 ÷ 2 = 202 172 758 623 474 339 631 906 + 0;
  • 202 172 758 623 474 339 631 906 ÷ 2 = 101 086 379 311 737 169 815 953 + 0;
  • 101 086 379 311 737 169 815 953 ÷ 2 = 50 543 189 655 868 584 907 976 + 1;
  • 50 543 189 655 868 584 907 976 ÷ 2 = 25 271 594 827 934 292 453 988 + 0;
  • 25 271 594 827 934 292 453 988 ÷ 2 = 12 635 797 413 967 146 226 994 + 0;
  • 12 635 797 413 967 146 226 994 ÷ 2 = 6 317 898 706 983 573 113 497 + 0;
  • 6 317 898 706 983 573 113 497 ÷ 2 = 3 158 949 353 491 786 556 748 + 1;
  • 3 158 949 353 491 786 556 748 ÷ 2 = 1 579 474 676 745 893 278 374 + 0;
  • 1 579 474 676 745 893 278 374 ÷ 2 = 789 737 338 372 946 639 187 + 0;
  • 789 737 338 372 946 639 187 ÷ 2 = 394 868 669 186 473 319 593 + 1;
  • 394 868 669 186 473 319 593 ÷ 2 = 197 434 334 593 236 659 796 + 1;
  • 197 434 334 593 236 659 796 ÷ 2 = 98 717 167 296 618 329 898 + 0;
  • 98 717 167 296 618 329 898 ÷ 2 = 49 358 583 648 309 164 949 + 0;
  • 49 358 583 648 309 164 949 ÷ 2 = 24 679 291 824 154 582 474 + 1;
  • 24 679 291 824 154 582 474 ÷ 2 = 12 339 645 912 077 291 237 + 0;
  • 12 339 645 912 077 291 237 ÷ 2 = 6 169 822 956 038 645 618 + 1;
  • 6 169 822 956 038 645 618 ÷ 2 = 3 084 911 478 019 322 809 + 0;
  • 3 084 911 478 019 322 809 ÷ 2 = 1 542 455 739 009 661 404 + 1;
  • 1 542 455 739 009 661 404 ÷ 2 = 771 227 869 504 830 702 + 0;
  • 771 227 869 504 830 702 ÷ 2 = 385 613 934 752 415 351 + 0;
  • 385 613 934 752 415 351 ÷ 2 = 192 806 967 376 207 675 + 1;
  • 192 806 967 376 207 675 ÷ 2 = 96 403 483 688 103 837 + 1;
  • 96 403 483 688 103 837 ÷ 2 = 48 201 741 844 051 918 + 1;
  • 48 201 741 844 051 918 ÷ 2 = 24 100 870 922 025 959 + 0;
  • 24 100 870 922 025 959 ÷ 2 = 12 050 435 461 012 979 + 1;
  • 12 050 435 461 012 979 ÷ 2 = 6 025 217 730 506 489 + 1;
  • 6 025 217 730 506 489 ÷ 2 = 3 012 608 865 253 244 + 1;
  • 3 012 608 865 253 244 ÷ 2 = 1 506 304 432 626 622 + 0;
  • 1 506 304 432 626 622 ÷ 2 = 753 152 216 313 311 + 0;
  • 753 152 216 313 311 ÷ 2 = 376 576 108 156 655 + 1;
  • 376 576 108 156 655 ÷ 2 = 188 288 054 078 327 + 1;
  • 188 288 054 078 327 ÷ 2 = 94 144 027 039 163 + 1;
  • 94 144 027 039 163 ÷ 2 = 47 072 013 519 581 + 1;
  • 47 072 013 519 581 ÷ 2 = 23 536 006 759 790 + 1;
  • 23 536 006 759 790 ÷ 2 = 11 768 003 379 895 + 0;
  • 11 768 003 379 895 ÷ 2 = 5 884 001 689 947 + 1;
  • 5 884 001 689 947 ÷ 2 = 2 942 000 844 973 + 1;
  • 2 942 000 844 973 ÷ 2 = 1 471 000 422 486 + 1;
  • 1 471 000 422 486 ÷ 2 = 735 500 211 243 + 0;
  • 735 500 211 243 ÷ 2 = 367 750 105 621 + 1;
  • 367 750 105 621 ÷ 2 = 183 875 052 810 + 1;
  • 183 875 052 810 ÷ 2 = 91 937 526 405 + 0;
  • 91 937 526 405 ÷ 2 = 45 968 763 202 + 1;
  • 45 968 763 202 ÷ 2 = 22 984 381 601 + 0;
  • 22 984 381 601 ÷ 2 = 11 492 190 800 + 1;
  • 11 492 190 800 ÷ 2 = 5 746 095 400 + 0;
  • 5 746 095 400 ÷ 2 = 2 873 047 700 + 0;
  • 2 873 047 700 ÷ 2 = 1 436 523 850 + 0;
  • 1 436 523 850 ÷ 2 = 718 261 925 + 0;
  • 718 261 925 ÷ 2 = 359 130 962 + 1;
  • 359 130 962 ÷ 2 = 179 565 481 + 0;
  • 179 565 481 ÷ 2 = 89 782 740 + 1;
  • 89 782 740 ÷ 2 = 44 891 370 + 0;
  • 44 891 370 ÷ 2 = 22 445 685 + 0;
  • 22 445 685 ÷ 2 = 11 222 842 + 1;
  • 11 222 842 ÷ 2 = 5 611 421 + 0;
  • 5 611 421 ÷ 2 = 2 805 710 + 1;
  • 2 805 710 ÷ 2 = 1 402 855 + 0;
  • 1 402 855 ÷ 2 = 701 427 + 1;
  • 701 427 ÷ 2 = 350 713 + 1;
  • 350 713 ÷ 2 = 175 356 + 1;
  • 175 356 ÷ 2 = 87 678 + 0;
  • 87 678 ÷ 2 = 43 839 + 0;
  • 43 839 ÷ 2 = 21 919 + 1;
  • 21 919 ÷ 2 = 10 959 + 1;
  • 10 959 ÷ 2 = 5 479 + 1;
  • 5 479 ÷ 2 = 2 739 + 1;
  • 2 739 ÷ 2 = 1 369 + 1;
  • 1 369 ÷ 2 = 684 + 1;
  • 684 ÷ 2 = 342 + 0;
  • 342 ÷ 2 = 171 + 0;
  • 171 ÷ 2 = 85 + 1;
  • 85 ÷ 2 = 42 + 1;
  • 42 ÷ 2 = 21 + 0;
  • 21 ÷ 2 = 10 + 1;
  • 10 ÷ 2 = 5 + 0;
  • 5 ÷ 2 = 2 + 1;
  • 2 ÷ 2 = 1 + 0;
  • 1 ÷ 2 = 0 + 1;

2. Construct the base 2 representation of the integer part of the number, by taking all the remainders starting from the bottom of the list constructed above:

1 001 111 011 011 110 111 010 010 111 011 110 101 110 000 101 000 111(10) =


10 1010 1100 1111 1100 1110 1010 0101 0000 1010 1101 1101 1111 0011 1011 1001 0101 0011 0010 0010 0010 0011 1111 0101 0010 1111 1010 1000 1000 0011 1010 0000 1101 1111 1011 0100 0100 0001 1010 1000 0011 1010 1111(2)

3. Normalize the binary representation of the number, shifting the decimal mark 169 positions to the left so that only one non zero digit remains to the left of it:

1 001 111 011 011 110 111 010 010 111 011 110 101 110 000 101 000 111(10) =


10 1010 1100 1111 1100 1110 1010 0101 0000 1010 1101 1101 1111 0011 1011 1001 0101 0011 0010 0010 0010 0011 1111 0101 0010 1111 1010 1000 1000 0011 1010 0000 1101 1111 1011 0100 0100 0001 1010 1000 0011 1010 1111(2) =


10 1010 1100 1111 1100 1110 1010 0101 0000 1010 1101 1101 1111 0011 1011 1001 0101 0011 0010 0010 0010 0011 1111 0101 0010 1111 1010 1000 1000 0011 1010 0000 1101 1111 1011 0100 0100 0001 1010 1000 0011 1010 1111(2) × 20 =


1.0101 0110 0111 1110 0111 0101 0010 1000 0101 0110 1110 1111 1001 1101 1100 1010 1001 1001 0001 0001 0001 1111 1010 1001 0111 1101 0100 0100 0001 1101 0000 0110 1111 1101 1010 0010 0000 1101 0100 0001 1101 0111 1(2) × 2169

Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign: 0 (a positive number)


Exponent (unadjusted): 169


Mantissa (not normalized): 1.0101 0110 0111 1110 0111 0101 0010 1000 0101 0110 1110 1111 1001 1101 1100 1010 1001 1001 0001 0001 0001 1111 1010 1001 0111 1101 0100 0100 0001 1101 0000 0110 1111 1101 1010 0010 0000 1101 0100 0001 1101 0111 1

4. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2:

Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


169 + 2(11-1) - 1 =


(169 + 1 023)(10) =


1 192(10)


  • division = quotient + remainder;
  • 1 192 ÷ 2 = 596 + 0;
  • 596 ÷ 2 = 298 + 0;
  • 298 ÷ 2 = 149 + 0;
  • 149 ÷ 2 = 74 + 1;
  • 74 ÷ 2 = 37 + 0;
  • 37 ÷ 2 = 18 + 1;
  • 18 ÷ 2 = 9 + 0;
  • 9 ÷ 2 = 4 + 1;
  • 4 ÷ 2 = 2 + 0;
  • 2 ÷ 2 = 1 + 0;
  • 1 ÷ 2 = 0 + 1;

Exponent (adjusted) =


1192(10) =


100 1010 1000(2)

5. Normalize mantissa, remove the leading (the leftmost) bit, since it's allways 1 (and the decimal point, if the case) then adjust its length to 52 bits, by removing the excess bits, from the right (if any of the excess bits is set on 1, we are losing precision...):

Mantissa (normalized) =


1. 0101 0110 0111 1110 0111 0101 0010 1000 0101 0110 1110 1111 1001 1 1011 1001 0101 0011 0010 0010 0010 0011 1111 0101 0010 1111 1010 1000 1000 0011 1010 0000 1101 1111 1011 0100 0100 0001 1010 1000 0011 1010 1111 =


0101 0110 0111 1110 0111 0101 0010 1000 0101 0110 1110 1111 1001

Conclusion:

The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
100 1010 1000


Mantissa (52 bits) =
0101 0110 0111 1110 0111 0101 0010 1000 0101 0110 1110 1111 1001

Number 1 001 111 011 011 110 111 010 010 111 011 110 101 110 000 101 000 111, a decimal, converted from decimal system (base 10)
to
64 bit double precision IEEE 754 binary floating point:


0 - 100 1010 1000 - 0101 0110 0111 1110 0111 0101 0010 1000 0101 0110 1110 1111 1001

(64 bits IEEE 754)
  • Sign (1 bit):

    • 0

      63
  • Exponent (11 bits):

    • 1

      62
    • 0

      61
    • 0

      60
    • 1

      59
    • 0

      58
    • 1

      57
    • 0

      56
    • 1

      55
    • 0

      54
    • 0

      53
    • 0

      52
  • Mantissa (52 bits):

    • 0

      51
    • 1

      50
    • 0

      49
    • 1

      48
    • 0

      47
    • 1

      46
    • 1

      45
    • 0

      44
    • 0

      43
    • 1

      42
    • 1

      41
    • 1

      40
    • 1

      39
    • 1

      38
    • 1

      37
    • 0

      36
    • 0

      35
    • 1

      34
    • 1

      33
    • 1

      32
    • 0

      31
    • 1

      30
    • 0

      29
    • 1

      28
    • 0

      27
    • 0

      26
    • 1

      25
    • 0

      24
    • 1

      23
    • 0

      22
    • 0

      21
    • 0

      20
    • 0

      19
    • 1

      18
    • 0

      17
    • 1

      16
    • 0

      15
    • 1

      14
    • 1

      13
    • 0

      12
    • 1

      11
    • 1

      10
    • 1

      9
    • 0

      8
    • 1

      7
    • 1

      6
    • 1

      5
    • 1

      4
    • 1

      3
    • 0

      2
    • 0

      1
    • 1

      0

Convert decimal numbers from base ten to 64 bit double precision IEEE 754 binary floating point standard

A number in 64 bit double precision IEEE 754 binary floating point standard representation requires three building elements: sign (it takes 1 bit and it's either 0 for positive or 1 for negative numbers), exponent (11 bits), mantissa (52 bits)

Latest decimal numbers converted from base ten to 64 bit double precision IEEE 754 floating point binary standard representation

1 001 111 011 011 110 111 010 010 111 011 110 101 110 000 101 000 111 = 0 - 100 1010 1000 - 0101 0110 0111 1110 0111 0101 0010 1000 0101 0110 1110 1111 1001 Oct 20 18:43 UTC (GMT)
952 = 0 - 100 0000 1000 - 1101 1100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 Oct 20 18:43 UTC (GMT)
3.22 = 0 - 100 0000 0000 - 1001 1100 0010 1000 1111 0101 1100 0010 1000 1111 0101 1100 0010 Oct 20 18:42 UTC (GMT)
1 005.03 = 0 - 100 0000 1000 - 1111 0110 1000 0011 1101 0111 0000 1010 0011 1101 0111 0000 1010 Oct 20 18:39 UTC (GMT)
1 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 = 0 - 100 1011 0010 - 0100 1110 0001 1000 0111 1000 1000 0001 0100 1100 1001 1100 1101 Oct 20 18:39 UTC (GMT)
5.75 = 0 - 100 0000 0001 - 0111 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 Oct 20 18:38 UTC (GMT)
0.190 381 954 206 713 999 155 908 823 013 305 664 062 5 = 0 - 011 1111 1100 - 1000 0101 1110 0110 1111 1001 0101 1000 1000 1000 0000 0000 0000 Oct 20 18:37 UTC (GMT)
0.190 381 954 206 713 999 155 908 823 013 305 664 062 5 = 0 - 011 1111 1100 - 1000 0101 1110 0110 1111 1001 0101 1000 1000 1000 0000 0000 0000 Oct 20 18:37 UTC (GMT)
1.212 72 = 0 - 011 1111 1111 - 0011 0110 0111 0100 1101 0001 0110 0011 0011 0100 1000 0010 1011 Oct 20 18:36 UTC (GMT)
35.674 893 = 0 - 100 0000 0100 - 0001 1101 0110 0110 0010 1110 0100 1101 0001 1010 0110 0101 0000 Oct 20 18:32 UTC (GMT)
7.5 = 0 - 100 0000 0001 - 1110 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 Oct 20 18:31 UTC (GMT)
2 555.8 = 0 - 100 0000 1010 - 0011 1111 0111 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 Oct 20 18:30 UTC (GMT)
3.141 592 653 589 793 238 462 643 383 = 0 - 100 0000 0000 - 1001 0010 0001 1111 1011 0101 0100 0100 0100 0010 1101 0001 1000 Oct 20 18:30 UTC (GMT)
All base ten decimal numbers converted to 64 bit double precision IEEE 754 binary floating point

How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =


    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100