64bit IEEE 754: Decimal ↗ Double Precision Floating Point Binary: 1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 057 Convert the Number to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard, From a Base Ten Decimal System Number

Number 1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 057₍₁₀₎ converted and written in 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.

division = quotient + remainder;
1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 057 ÷ 2 = 550 050 005 500 550 055 050 000 550 000 005 500 000 055 028 + 1;
550 050 005 500 550 055 050 000 550 000 005 500 000 055 028 ÷ 2 = 275 025 002 750 275 027 525 000 275 000 002 750 000 027 514 + 0;
275 025 002 750 275 027 525 000 275 000 002 750 000 027 514 ÷ 2 = 137 512 501 375 137 513 762 500 137 500 001 375 000 013 757 + 0;
137 512 501 375 137 513 762 500 137 500 001 375 000 013 757 ÷ 2 = 68 756 250 687 568 756 881 250 068 750 000 687 500 006 878 + 1;
68 756 250 687 568 756 881 250 068 750 000 687 500 006 878 ÷ 2 = 34 378 125 343 784 378 440 625 034 375 000 343 750 003 439 + 0;
34 378 125 343 784 378 440 625 034 375 000 343 750 003 439 ÷ 2 = 17 189 062 671 892 189 220 312 517 187 500 171 875 001 719 + 1;
17 189 062 671 892 189 220 312 517 187 500 171 875 001 719 ÷ 2 = 8 594 531 335 946 094 610 156 258 593 750 085 937 500 859 + 1;
8 594 531 335 946 094 610 156 258 593 750 085 937 500 859 ÷ 2 = 4 297 265 667 973 047 305 078 129 296 875 042 968 750 429 + 1;
4 297 265 667 973 047 305 078 129 296 875 042 968 750 429 ÷ 2 = 2 148 632 833 986 523 652 539 064 648 437 521 484 375 214 + 1;
2 148 632 833 986 523 652 539 064 648 437 521 484 375 214 ÷ 2 = 1 074 316 416 993 261 826 269 532 324 218 760 742 187 607 + 0;
1 074 316 416 993 261 826 269 532 324 218 760 742 187 607 ÷ 2 = 537 158 208 496 630 913 134 766 162 109 380 371 093 803 + 1;
537 158 208 496 630 913 134 766 162 109 380 371 093 803 ÷ 2 = 268 579 104 248 315 456 567 383 081 054 690 185 546 901 + 1;
268 579 104 248 315 456 567 383 081 054 690 185 546 901 ÷ 2 = 134 289 552 124 157 728 283 691 540 527 345 092 773 450 + 1;
134 289 552 124 157 728 283 691 540 527 345 092 773 450 ÷ 2 = 67 144 776 062 078 864 141 845 770 263 672 546 386 725 + 0;
67 144 776 062 078 864 141 845 770 263 672 546 386 725 ÷ 2 = 33 572 388 031 039 432 070 922 885 131 836 273 193 362 + 1;
33 572 388 031 039 432 070 922 885 131 836 273 193 362 ÷ 2 = 16 786 194 015 519 716 035 461 442 565 918 136 596 681 + 0;
16 786 194 015 519 716 035 461 442 565 918 136 596 681 ÷ 2 = 8 393 097 007 759 858 017 730 721 282 959 068 298 340 + 1;
8 393 097 007 759 858 017 730 721 282 959 068 298 340 ÷ 2 = 4 196 548 503 879 929 008 865 360 641 479 534 149 170 + 0;
4 196 548 503 879 929 008 865 360 641 479 534 149 170 ÷ 2 = 2 098 274 251 939 964 504 432 680 320 739 767 074 585 + 0;
2 098 274 251 939 964 504 432 680 320 739 767 074 585 ÷ 2 = 1 049 137 125 969 982 252 216 340 160 369 883 537 292 + 1;
1 049 137 125 969 982 252 216 340 160 369 883 537 292 ÷ 2 = 524 568 562 984 991 126 108 170 080 184 941 768 646 + 0;
524 568 562 984 991 126 108 170 080 184 941 768 646 ÷ 2 = 262 284 281 492 495 563 054 085 040 092 470 884 323 + 0;
262 284 281 492 495 563 054 085 040 092 470 884 323 ÷ 2 = 131 142 140 746 247 781 527 042 520 046 235 442 161 + 1;
131 142 140 746 247 781 527 042 520 046 235 442 161 ÷ 2 = 65 571 070 373 123 890 763 521 260 023 117 721 080 + 1;
65 571 070 373 123 890 763 521 260 023 117 721 080 ÷ 2 = 32 785 535 186 561 945 381 760 630 011 558 860 540 + 0;
32 785 535 186 561 945 381 760 630 011 558 860 540 ÷ 2 = 16 392 767 593 280 972 690 880 315 005 779 430 270 + 0;
16 392 767 593 280 972 690 880 315 005 779 430 270 ÷ 2 = 8 196 383 796 640 486 345 440 157 502 889 715 135 + 0;
8 196 383 796 640 486 345 440 157 502 889 715 135 ÷ 2 = 4 098 191 898 320 243 172 720 078 751 444 857 567 + 1;
4 098 191 898 320 243 172 720 078 751 444 857 567 ÷ 2 = 2 049 095 949 160 121 586 360 039 375 722 428 783 + 1;
2 049 095 949 160 121 586 360 039 375 722 428 783 ÷ 2 = 1 024 547 974 580 060 793 180 019 687 861 214 391 + 1;
1 024 547 974 580 060 793 180 019 687 861 214 391 ÷ 2 = 512 273 987 290 030 396 590 009 843 930 607 195 + 1;
512 273 987 290 030 396 590 009 843 930 607 195 ÷ 2 = 256 136 993 645 015 198 295 004 921 965 303 597 + 1;
256 136 993 645 015 198 295 004 921 965 303 597 ÷ 2 = 128 068 496 822 507 599 147 502 460 982 651 798 + 1;
128 068 496 822 507 599 147 502 460 982 651 798 ÷ 2 = 64 034 248 411 253 799 573 751 230 491 325 899 + 0;
64 034 248 411 253 799 573 751 230 491 325 899 ÷ 2 = 32 017 124 205 626 899 786 875 615 245 662 949 + 1;
32 017 124 205 626 899 786 875 615 245 662 949 ÷ 2 = 16 008 562 102 813 449 893 437 807 622 831 474 + 1;
16 008 562 102 813 449 893 437 807 622 831 474 ÷ 2 = 8 004 281 051 406 724 946 718 903 811 415 737 + 0;
8 004 281 051 406 724 946 718 903 811 415 737 ÷ 2 = 4 002 140 525 703 362 473 359 451 905 707 868 + 1;
4 002 140 525 703 362 473 359 451 905 707 868 ÷ 2 = 2 001 070 262 851 681 236 679 725 952 853 934 + 0;
2 001 070 262 851 681 236 679 725 952 853 934 ÷ 2 = 1 000 535 131 425 840 618 339 862 976 426 967 + 0;
1 000 535 131 425 840 618 339 862 976 426 967 ÷ 2 = 500 267 565 712 920 309 169 931 488 213 483 + 1;
500 267 565 712 920 309 169 931 488 213 483 ÷ 2 = 250 133 782 856 460 154 584 965 744 106 741 + 1;
250 133 782 856 460 154 584 965 744 106 741 ÷ 2 = 125 066 891 428 230 077 292 482 872 053 370 + 1;
125 066 891 428 230 077 292 482 872 053 370 ÷ 2 = 62 533 445 714 115 038 646 241 436 026 685 + 0;
62 533 445 714 115 038 646 241 436 026 685 ÷ 2 = 31 266 722 857 057 519 323 120 718 013 342 + 1;
31 266 722 857 057 519 323 120 718 013 342 ÷ 2 = 15 633 361 428 528 759 661 560 359 006 671 + 0;
15 633 361 428 528 759 661 560 359 006 671 ÷ 2 = 7 816 680 714 264 379 830 780 179 503 335 + 1;
7 816 680 714 264 379 830 780 179 503 335 ÷ 2 = 3 908 340 357 132 189 915 390 089 751 667 + 1;
3 908 340 357 132 189 915 390 089 751 667 ÷ 2 = 1 954 170 178 566 094 957 695 044 875 833 + 1;
1 954 170 178 566 094 957 695 044 875 833 ÷ 2 = 977 085 089 283 047 478 847 522 437 916 + 1;
977 085 089 283 047 478 847 522 437 916 ÷ 2 = 488 542 544 641 523 739 423 761 218 958 + 0;
488 542 544 641 523 739 423 761 218 958 ÷ 2 = 244 271 272 320 761 869 711 880 609 479 + 0;
244 271 272 320 761 869 711 880 609 479 ÷ 2 = 122 135 636 160 380 934 855 940 304 739 + 1;
122 135 636 160 380 934 855 940 304 739 ÷ 2 = 61 067 818 080 190 467 427 970 152 369 + 1;
61 067 818 080 190 467 427 970 152 369 ÷ 2 = 30 533 909 040 095 233 713 985 076 184 + 1;
30 533 909 040 095 233 713 985 076 184 ÷ 2 = 15 266 954 520 047 616 856 992 538 092 + 0;
15 266 954 520 047 616 856 992 538 092 ÷ 2 = 7 633 477 260 023 808 428 496 269 046 + 0;
7 633 477 260 023 808 428 496 269 046 ÷ 2 = 3 816 738 630 011 904 214 248 134 523 + 0;
3 816 738 630 011 904 214 248 134 523 ÷ 2 = 1 908 369 315 005 952 107 124 067 261 + 1;
1 908 369 315 005 952 107 124 067 261 ÷ 2 = 954 184 657 502 976 053 562 033 630 + 1;
954 184 657 502 976 053 562 033 630 ÷ 2 = 477 092 328 751 488 026 781 016 815 + 0;
477 092 328 751 488 026 781 016 815 ÷ 2 = 238 546 164 375 744 013 390 508 407 + 1;
238 546 164 375 744 013 390 508 407 ÷ 2 = 119 273 082 187 872 006 695 254 203 + 1;
119 273 082 187 872 006 695 254 203 ÷ 2 = 59 636 541 093 936 003 347 627 101 + 1;
59 636 541 093 936 003 347 627 101 ÷ 2 = 29 818 270 546 968 001 673 813 550 + 1;
29 818 270 546 968 001 673 813 550 ÷ 2 = 14 909 135 273 484 000 836 906 775 + 0;
14 909 135 273 484 000 836 906 775 ÷ 2 = 7 454 567 636 742 000 418 453 387 + 1;
7 454 567 636 742 000 418 453 387 ÷ 2 = 3 727 283 818 371 000 209 226 693 + 1;
3 727 283 818 371 000 209 226 693 ÷ 2 = 1 863 641 909 185 500 104 613 346 + 1;
1 863 641 909 185 500 104 613 346 ÷ 2 = 931 820 954 592 750 052 306 673 + 0;
931 820 954 592 750 052 306 673 ÷ 2 = 465 910 477 296 375 026 153 336 + 1;
465 910 477 296 375 026 153 336 ÷ 2 = 232 955 238 648 187 513 076 668 + 0;
232 955 238 648 187 513 076 668 ÷ 2 = 116 477 619 324 093 756 538 334 + 0;
116 477 619 324 093 756 538 334 ÷ 2 = 58 238 809 662 046 878 269 167 + 0;
58 238 809 662 046 878 269 167 ÷ 2 = 29 119 404 831 023 439 134 583 + 1;
29 119 404 831 023 439 134 583 ÷ 2 = 14 559 702 415 511 719 567 291 + 1;
14 559 702 415 511 719 567 291 ÷ 2 = 7 279 851 207 755 859 783 645 + 1;
7 279 851 207 755 859 783 645 ÷ 2 = 3 639 925 603 877 929 891 822 + 1;
3 639 925 603 877 929 891 822 ÷ 2 = 1 819 962 801 938 964 945 911 + 0;
1 819 962 801 938 964 945 911 ÷ 2 = 909 981 400 969 482 472 955 + 1;
909 981 400 969 482 472 955 ÷ 2 = 454 990 700 484 741 236 477 + 1;
454 990 700 484 741 236 477 ÷ 2 = 227 495 350 242 370 618 238 + 1;
227 495 350 242 370 618 238 ÷ 2 = 113 747 675 121 185 309 119 + 0;
113 747 675 121 185 309 119 ÷ 2 = 56 873 837 560 592 654 559 + 1;
56 873 837 560 592 654 559 ÷ 2 = 28 436 918 780 296 327 279 + 1;
28 436 918 780 296 327 279 ÷ 2 = 14 218 459 390 148 163 639 + 1;
14 218 459 390 148 163 639 ÷ 2 = 7 109 229 695 074 081 819 + 1;
7 109 229 695 074 081 819 ÷ 2 = 3 554 614 847 537 040 909 + 1;
3 554 614 847 537 040 909 ÷ 2 = 1 777 307 423 768 520 454 + 1;
1 777 307 423 768 520 454 ÷ 2 = 888 653 711 884 260 227 + 0;
888 653 711 884 260 227 ÷ 2 = 444 326 855 942 130 113 + 1;
444 326 855 942 130 113 ÷ 2 = 222 163 427 971 065 056 + 1;
222 163 427 971 065 056 ÷ 2 = 111 081 713 985 532 528 + 0;
111 081 713 985 532 528 ÷ 2 = 55 540 856 992 766 264 + 0;
55 540 856 992 766 264 ÷ 2 = 27 770 428 496 383 132 + 0;
27 770 428 496 383 132 ÷ 2 = 13 885 214 248 191 566 + 0;
13 885 214 248 191 566 ÷ 2 = 6 942 607 124 095 783 + 0;
6 942 607 124 095 783 ÷ 2 = 3 471 303 562 047 891 + 1;
3 471 303 562 047 891 ÷ 2 = 1 735 651 781 023 945 + 1;
1 735 651 781 023 945 ÷ 2 = 867 825 890 511 972 + 1;
867 825 890 511 972 ÷ 2 = 433 912 945 255 986 + 0;
433 912 945 255 986 ÷ 2 = 216 956 472 627 993 + 0;
216 956 472 627 993 ÷ 2 = 108 478 236 313 996 + 1;
108 478 236 313 996 ÷ 2 = 54 239 118 156 998 + 0;
54 239 118 156 998 ÷ 2 = 27 119 559 078 499 + 0;
27 119 559 078 499 ÷ 2 = 13 559 779 539 249 + 1;
13 559 779 539 249 ÷ 2 = 6 779 889 769 624 + 1;
6 779 889 769 624 ÷ 2 = 3 389 944 884 812 + 0;
3 389 944 884 812 ÷ 2 = 1 694 972 442 406 + 0;
1 694 972 442 406 ÷ 2 = 847 486 221 203 + 0;
847 486 221 203 ÷ 2 = 423 743 110 601 + 1;
423 743 110 601 ÷ 2 = 211 871 555 300 + 1;
211 871 555 300 ÷ 2 = 105 935 777 650 + 0;
105 935 777 650 ÷ 2 = 52 967 888 825 + 0;
52 967 888 825 ÷ 2 = 26 483 944 412 + 1;
26 483 944 412 ÷ 2 = 13 241 972 206 + 0;
13 241 972 206 ÷ 2 = 6 620 986 103 + 0;
6 620 986 103 ÷ 2 = 3 310 493 051 + 1;
3 310 493 051 ÷ 2 = 1 655 246 525 + 1;
1 655 246 525 ÷ 2 = 827 623 262 + 1;
827 623 262 ÷ 2 = 413 811 631 + 0;
413 811 631 ÷ 2 = 206 905 815 + 1;
206 905 815 ÷ 2 = 103 452 907 + 1;
103 452 907 ÷ 2 = 51 726 453 + 1;
51 726 453 ÷ 2 = 25 863 226 + 1;
25 863 226 ÷ 2 = 12 931 613 + 0;
12 931 613 ÷ 2 = 6 465 806 + 1;
6 465 806 ÷ 2 = 3 232 903 + 0;
3 232 903 ÷ 2 = 1 616 451 + 1;
1 616 451 ÷ 2 = 808 225 + 1;
808 225 ÷ 2 = 404 112 + 1;
404 112 ÷ 2 = 202 056 + 0;
202 056 ÷ 2 = 101 028 + 0;
101 028 ÷ 2 = 50 514 + 0;
50 514 ÷ 2 = 25 257 + 0;
25 257 ÷ 2 = 12 628 + 1;
12 628 ÷ 2 = 6 314 + 0;
6 314 ÷ 2 = 3 157 + 0;
3 157 ÷ 2 = 1 578 + 1;
1 578 ÷ 2 = 789 + 0;
789 ÷ 2 = 394 + 1;
394 ÷ 2 = 197 + 0;
197 ÷ 2 = 98 + 1;
98 ÷ 2 = 49 + 0;
49 ÷ 2 = 24 + 1;
24 ÷ 2 = 12 + 0;
12 ÷ 2 = 6 + 0;
6 ÷ 2 = 3 + 0;
3 ÷ 2 = 1 + 1;
1 ÷ 2 = 0 + 1;

2. Construct the base 2 representation of the positive number.

Take all the remainders starting from the bottom of the list constructed above.

1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 057₍₁₀₎ =

11 0001 0101 0100 1000 0111 0101 1110 1110 0100 1100 0110 0100 1110 0000 1101 1111 1011 1011 1100 0101 1101 1110 1100 0111 0011 1101 0111 0010 1101 1111 1000 1100 1001 0101 1101 1110 1001₍₂₎

3. Normalize the binary representation of the number.

Shift the decimal mark 149 positions to the left, so that only one non zero digit remains to the left of it:

1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 057₍₁₀₎=

11 0001 0101 0100 1000 0111 0101 1110 1110 0100 1100 0110 0100 1110 0000 1101 1111 1011 1011 1100 0101 1101 1110 1100 0111 0011 1101 0111 0010 1101 1111 1000 1100 1001 0101 1101 1110 1001₍₂₎=

11 0001 0101 0100 1000 0111 0101 1110 1110 0100 1100 0110 0100 1110 0000 1101 1111 1011 1011 1100 0101 1101 1110 1100 0111 0011 1101 0111 0010 1101 1111 1000 1100 1001 0101 1101 1110 1001₍₂₎ × 2⁰=

1.1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111 0000 0110 1111 1101 1101 1110 0010 1110 1111 0110 0011 1001 1110 1011 1001 0110 1111 1100 0110 0100 1010 1110 1111 0100 1₍₂₎ × 2¹⁴⁹

4. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)

Exponent (unadjusted): 149

Mantissa (not normalized):
1.1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111 0000 0110 1111 1101 1101 1110 0010 1110 1111 0110 0011 1001 1110 1011 1001 0110 1111 1100 0110 0100 1010 1110 1111 0100 1

5. Adjust the exponent.

Use the 11 bit excess/bias notation:

Exponent (adjusted) =

Exponent (unadjusted) + 2^(11-1) - 1 =

149 + 2^(11-1) - 1 =

(149 + 1 023)₍₁₀₎ =

1 172₍₁₀₎

6. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:

division = quotient + remainder;
1 172 ÷ 2 = 586 + 0;
586 ÷ 2 = 293 + 0;
293 ÷ 2 = 146 + 1;
146 ÷ 2 = 73 + 0;
73 ÷ 2 = 36 + 1;
36 ÷ 2 = 18 + 0;
18 ÷ 2 = 9 + 0;
9 ÷ 2 = 4 + 1;
4 ÷ 2 = 2 + 0;
2 ÷ 2 = 1 + 0;
1 ÷ 2 = 0 + 1;

7. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.

Exponent (adjusted) =

1172₍₁₀₎ =

100 1001 0100₍₂₎

8. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.

b) Adjust its length to 52 bits, by removing the excess bits, from the right (if any of the excess bits is set on 1, we are losing precision...).

Mantissa (normalized) =

1. 1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111 0 0000 1101 1111 1011 1011 1100 0101 1101 1110 1100 0111 0011 1101 0111 0010 1101 1111 1000 1100 1001 0101 1101 1110 1001 =

1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111

9. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)

Exponent (11 bits) =
100 1001 0100

Mantissa (52 bits) =
1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111

The base ten decimal number 1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 057 converted and written in 64 bit double precision IEEE 754 binary floating point representation:
0 - 100 1001 0100 - 1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111

More operations with decimal numbers converted to 64 bit double precision IEEE 754 binary floating point representation:

» Number 1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 056 converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point representation = ?

» Number 1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 058 converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point representation = ?

How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

1. If the number to be converted is negative, start with its the positive version.
2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
Exponent (adjusted) = Exponent (unadjusted) + 2^(11-1) - 1
8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

1. Start with the positive version of the number:
|-31.640 215| = 31.640 215
2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
- division = quotient + remainder;
- 31 ÷ 2 = 15 + 1;
- 15 ÷ 2 = 7 + 1;
- 7 ÷ 2 = 3 + 1;
- 3 ÷ 2 = 1 + 1;
- 1 ÷ 2 = 0 + 1;
- We have encountered a quotient that is ZERO => FULL STOP
3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:
31₍₁₀₎ = 1 1111₍₂₎
4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
- #) multiplying = integer + fractional part;
- 1) 0.640 215 × 2 = 1 + 0.280 43;
- 2) 0.280 43 × 2 = 0 + 0.560 86;
- 3) 0.560 86 × 2 = 1 + 0.121 72;
- 4) 0.121 72 × 2 = 0 + 0.243 44;
- 5) 0.243 44 × 2 = 0 + 0.486 88;
- 6) 0.486 88 × 2 = 0 + 0.973 76;
- 7) 0.973 76 × 2 = 1 + 0.947 52;
- 8) 0.947 52 × 2 = 1 + 0.895 04;
- 9) 0.895 04 × 2 = 1 + 0.790 08;
- 10) 0.790 08 × 2 = 1 + 0.580 16;
- 11) 0.580 16 × 2 = 1 + 0.160 32;
- 12) 0.160 32 × 2 = 0 + 0.320 64;
- 13) 0.320 64 × 2 = 0 + 0.641 28;
- 14) 0.641 28 × 2 = 1 + 0.282 56;
- 15) 0.282 56 × 2 = 0 + 0.565 12;
- 16) 0.565 12 × 2 = 1 + 0.130 24;
- 17) 0.130 24 × 2 = 0 + 0.260 48;
- 18) 0.260 48 × 2 = 0 + 0.520 96;
- 19) 0.520 96 × 2 = 1 + 0.041 92;
- 20) 0.041 92 × 2 = 0 + 0.083 84;
- 21) 0.083 84 × 2 = 0 + 0.167 68;
- 22) 0.167 68 × 2 = 0 + 0.335 36;
- 23) 0.335 36 × 2 = 0 + 0.670 72;
- 24) 0.670 72 × 2 = 1 + 0.341 44;
- 25) 0.341 44 × 2 = 0 + 0.682 88;
- 26) 0.682 88 × 2 = 1 + 0.365 76;
- 27) 0.365 76 × 2 = 0 + 0.731 52;
- 28) 0.731 52 × 2 = 1 + 0.463 04;
- 29) 0.463 04 × 2 = 0 + 0.926 08;
- 30) 0.926 08 × 2 = 1 + 0.852 16;
- 31) 0.852 16 × 2 = 1 + 0.704 32;
- 32) 0.704 32 × 2 = 1 + 0.408 64;
- 33) 0.408 64 × 2 = 0 + 0.817 28;
- 34) 0.817 28 × 2 = 1 + 0.634 56;
- 35) 0.634 56 × 2 = 1 + 0.269 12;
- 36) 0.269 12 × 2 = 0 + 0.538 24;
- 37) 0.538 24 × 2 = 1 + 0.076 48;
- 38) 0.076 48 × 2 = 0 + 0.152 96;
- 39) 0.152 96 × 2 = 0 + 0.305 92;
- 40) 0.305 92 × 2 = 0 + 0.611 84;
- 41) 0.611 84 × 2 = 1 + 0.223 68;
- 42) 0.223 68 × 2 = 0 + 0.447 36;
- 43) 0.447 36 × 2 = 0 + 0.894 72;
- 44) 0.894 72 × 2 = 1 + 0.789 44;
- 45) 0.789 44 × 2 = 1 + 0.578 88;
- 46) 0.578 88 × 2 = 1 + 0.157 76;
- 47) 0.157 76 × 2 = 0 + 0.315 52;
- 48) 0.315 52 × 2 = 0 + 0.631 04;
- 49) 0.631 04 × 2 = 1 + 0.262 08;
- 50) 0.262 08 × 2 = 0 + 0.524 16;
- 51) 0.524 16 × 2 = 1 + 0.048 32;
- 52) 0.048 32 × 2 = 0 + 0.096 64;
- 53) 0.096 64 × 2 = 0 + 0.193 28;
- We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:
0.640 215₍₁₀₎ = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0₍₂₎
6. Summarizing - the positive number before normalization:
31.640 215₍₁₀₎ = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0₍₂₎
7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:
31.640 215₍₁₀₎ =
1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0₍₂₎ =
1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0₍₂₎ × 2⁰ =
1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0₍₂₎ × 2⁴
8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:
Sign: 1 (a negative number)
Exponent (unadjusted): 4
Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0
9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:
Exponent (adjusted) = Exponent (unadjusted) + 2^(11-1) - 1 = (4 + 1023)₍₁₀₎ = 1027₍₁₀₎ =
100 0000 0011₍₂₎
10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):
Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0
Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100
Conclusion:
Sign (1 bit) = 1 (a negative number)
Exponent (8 bits) = 100 0000 0011
Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100
Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

Number 14.8 converted from decimal system (written in base ten) to 64 bit double precision IEEE 754 binary floating point representation standard	May 19 03:06 UTC (GMT)
Number 10 000 973 converted from decimal system (written in base ten) to 64 bit double precision IEEE 754 binary floating point representation standard	May 19 03:06 UTC (GMT)
Number 348.002 929 8 converted from decimal system (written in base ten) to 64 bit double precision IEEE 754 binary floating point representation standard	May 19 03:06 UTC (GMT)
Number 13 819 295 456 586 367 032 converted from decimal system (written in base ten) to 64 bit double precision IEEE 754 binary floating point representation standard	May 19 03:06 UTC (GMT)
Number 8 102 106 converted from decimal system (written in base ten) to 64 bit double precision IEEE 754 binary floating point representation standard	May 19 03:06 UTC (GMT)
Number 0.173 648 6 converted from decimal system (written in base ten) to 64 bit double precision IEEE 754 binary floating point representation standard	May 19 03:06 UTC (GMT)
Number 10.103 989 496 converted from decimal system (written in base ten) to 64 bit double precision IEEE 754 binary floating point representation standard	May 19 03:06 UTC (GMT)
Number -43 719 converted from decimal system (written in base ten) to 64 bit double precision IEEE 754 binary floating point representation standard	May 19 03:06 UTC (GMT)
Number 11 263 165 converted from decimal system (written in base ten) to 64 bit double precision IEEE 754 binary floating point representation standard	May 19 03:06 UTC (GMT)
Number 88.33 converted from decimal system (written in base ten) to 64 bit double precision IEEE 754 binary floating point representation standard	May 19 03:05 UTC (GMT)
All base ten decimal numbers converted to 64 bit double precision IEEE 754 binary floating point

64bit IEEE 754: Decimal ↗ Double Precision Floating Point Binary: 1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 057 Convert the Number to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard, From a Base Ten Decimal System Number

Number 1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 057(10) converted and written in 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.

2. Construct the base 2 representation of the positive number.

Take all the remainders starting from the bottom of the list constructed above.

1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 057(10) =

11 0001 0101 0100 1000 0111 0101 1110 1110 0100 1100 0110 0100 1110 0000 1101 1111 1011 1011 1100 0101 1101 1110 1100 0111 0011 1101 0111 0010 1101 1111 1000 1100 1001 0101 1101 1110 1001(2)

3. Normalize the binary representation of the number.

Shift the decimal mark 149 positions to the left, so that only one non zero digit remains to the left of it:

1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 057(10) =

11 0001 0101 0100 1000 0111 0101 1110 1110 0100 1100 0110 0100 1110 0000 1101 1111 1011 1011 1100 0101 1101 1110 1100 0111 0011 1101 0111 0010 1101 1111 1000 1100 1001 0101 1101 1110 1001(2) =

11 0001 0101 0100 1000 0111 0101 1110 1110 0100 1100 0110 0100 1110 0000 1101 1111 1011 1011 1100 0101 1101 1110 1100 0111 0011 1101 0111 0010 1101 1111 1000 1100 1001 0101 1101 1110 1001(2) × 20 =

1.1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111 0000 0110 1111 1101 1101 1110 0010 1110 1111 0110 0011 1001 1110 1011 1001 0110 1111 1100 0110 0100 1010 1110 1111 0100 1(2) × 2149

4. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)

Exponent (unadjusted): 149

Mantissa (not normalized): 1.1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111 0000 0110 1111 1101 1101 1110 0010 1110 1111 0110 0011 1001 1110 1011 1001 0110 1111 1100 0110 0100 1010 1110 1111 0100 1

5. Adjust the exponent.

Use the 11 bit excess/bias notation:

Exponent (adjusted) =

Exponent (unadjusted) + 2(11-1) - 1 =

149 + 2(11-1) - 1 =

(149 + 1 023)(10) =

1 172(10)

6. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:

7. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.

Exponent (adjusted) =

1172(10) =

100 1001 0100(2)

8. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.

b) Adjust its length to 52 bits, by removing the excess bits, from the right (if any of the excess bits is set on 1, we are losing precision...).

Mantissa (normalized) =

1. 1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111 0 0000 1101 1111 1011 1011 1100 0101 1101 1110 1100 0111 0011 1101 0111 0010 1101 1111 1000 1100 1001 0101 1101 1110 1001 =

1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111

9. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) = 0 (a positive number)

Exponent (11 bits) = 100 1001 0100

Mantissa (52 bits) = 1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111

The base ten decimal number 1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 057 converted and written in 64 bit double precision IEEE 754 binary floating point representation: 0 - 100 1001 0100 - 1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111

More operations with decimal numbers converted to 64 bit double precision IEEE 754 binary floating point representation:

» Number 1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 056 converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point representation = ?

» Number 1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 058 converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point representation = ?

The latest decimal numbers converted from base ten to 64 bit double precision IEEE 754 floating point binary standard representation

How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

Number 1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 057₍₁₀₎ converted and written in 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 057₍₁₀₎ =

11 0001 0101 0100 1000 0111 0101 1110 1110 0100 1100 0110 0100 1110 0000 1101 1111 1011 1011 1100 0101 1101 1110 1100 0111 0011 1101 0111 0010 1101 1111 1000 1100 1001 0101 1101 1110 1001₍₂₎

1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 057₍₁₀₎=

11 0001 0101 0100 1000 0111 0101 1110 1110 0100 1100 0110 0100 1110 0000 1101 1111 1011 1011 1100 0101 1101 1110 1100 0111 0011 1101 0111 0010 1101 1111 1000 1100 1001 0101 1101 1110 1001₍₂₎=

11 0001 0101 0100 1000 0111 0101 1110 1110 0100 1100 0110 0100 1110 0000 1101 1111 1011 1011 1100 0101 1101 1110 1100 0111 0011 1101 0111 0010 1101 1111 1000 1100 1001 0101 1101 1110 1001₍₂₎ × 2⁰=

1.1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111 0000 0110 1111 1101 1101 1110 0010 1110 1111 0110 0011 1001 1110 1011 1001 0110 1111 1100 0110 0100 1010 1110 1111 0100 1₍₂₎ × 2¹⁴⁹

Mantissa (not normalized):
1.1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111 0000 0110 1111 1101 1101 1110 0010 1110 1111 0110 0011 1001 1110 1011 1001 0110 1111 1100 0110 0100 1010 1110 1111 0100 1

Exponent (unadjusted) + 2^(11-1) - 1 =

149 + 2^(11-1) - 1 =

(149 + 1 023)₍₁₀₎ =

1 172₍₁₀₎

1172₍₁₀₎ =

100 1001 0100₍₂₎

Sign (1 bit) =
0 (a positive number)

Exponent (11 bits) =
100 1001 0100

Mantissa (52 bits) =
1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111

The base ten decimal number 1 100 100 011 001 100 110 100 001 100 000 011 000 000 110 057 converted and written in 64 bit double precision IEEE 754 binary floating point representation:
0 - 100 1001 0100 - 1000 1010 1010 0100 0011 1010 1111 0111 0010 0110 0011 0010 0111