0.000 000 000 000 000 012 345 687 894 564 589 304 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 012 345 687 894 564 589 304(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 012 345 687 894 564 589 304(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 012 345 687 894 564 589 304.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 012 345 687 894 564 589 304 × 2 = 0 + 0.000 000 000 000 000 024 691 375 789 129 178 608;
  • 2) 0.000 000 000 000 000 024 691 375 789 129 178 608 × 2 = 0 + 0.000 000 000 000 000 049 382 751 578 258 357 216;
  • 3) 0.000 000 000 000 000 049 382 751 578 258 357 216 × 2 = 0 + 0.000 000 000 000 000 098 765 503 156 516 714 432;
  • 4) 0.000 000 000 000 000 098 765 503 156 516 714 432 × 2 = 0 + 0.000 000 000 000 000 197 531 006 313 033 428 864;
  • 5) 0.000 000 000 000 000 197 531 006 313 033 428 864 × 2 = 0 + 0.000 000 000 000 000 395 062 012 626 066 857 728;
  • 6) 0.000 000 000 000 000 395 062 012 626 066 857 728 × 2 = 0 + 0.000 000 000 000 000 790 124 025 252 133 715 456;
  • 7) 0.000 000 000 000 000 790 124 025 252 133 715 456 × 2 = 0 + 0.000 000 000 000 001 580 248 050 504 267 430 912;
  • 8) 0.000 000 000 000 001 580 248 050 504 267 430 912 × 2 = 0 + 0.000 000 000 000 003 160 496 101 008 534 861 824;
  • 9) 0.000 000 000 000 003 160 496 101 008 534 861 824 × 2 = 0 + 0.000 000 000 000 006 320 992 202 017 069 723 648;
  • 10) 0.000 000 000 000 006 320 992 202 017 069 723 648 × 2 = 0 + 0.000 000 000 000 012 641 984 404 034 139 447 296;
  • 11) 0.000 000 000 000 012 641 984 404 034 139 447 296 × 2 = 0 + 0.000 000 000 000 025 283 968 808 068 278 894 592;
  • 12) 0.000 000 000 000 025 283 968 808 068 278 894 592 × 2 = 0 + 0.000 000 000 000 050 567 937 616 136 557 789 184;
  • 13) 0.000 000 000 000 050 567 937 616 136 557 789 184 × 2 = 0 + 0.000 000 000 000 101 135 875 232 273 115 578 368;
  • 14) 0.000 000 000 000 101 135 875 232 273 115 578 368 × 2 = 0 + 0.000 000 000 000 202 271 750 464 546 231 156 736;
  • 15) 0.000 000 000 000 202 271 750 464 546 231 156 736 × 2 = 0 + 0.000 000 000 000 404 543 500 929 092 462 313 472;
  • 16) 0.000 000 000 000 404 543 500 929 092 462 313 472 × 2 = 0 + 0.000 000 000 000 809 087 001 858 184 924 626 944;
  • 17) 0.000 000 000 000 809 087 001 858 184 924 626 944 × 2 = 0 + 0.000 000 000 001 618 174 003 716 369 849 253 888;
  • 18) 0.000 000 000 001 618 174 003 716 369 849 253 888 × 2 = 0 + 0.000 000 000 003 236 348 007 432 739 698 507 776;
  • 19) 0.000 000 000 003 236 348 007 432 739 698 507 776 × 2 = 0 + 0.000 000 000 006 472 696 014 865 479 397 015 552;
  • 20) 0.000 000 000 006 472 696 014 865 479 397 015 552 × 2 = 0 + 0.000 000 000 012 945 392 029 730 958 794 031 104;
  • 21) 0.000 000 000 012 945 392 029 730 958 794 031 104 × 2 = 0 + 0.000 000 000 025 890 784 059 461 917 588 062 208;
  • 22) 0.000 000 000 025 890 784 059 461 917 588 062 208 × 2 = 0 + 0.000 000 000 051 781 568 118 923 835 176 124 416;
  • 23) 0.000 000 000 051 781 568 118 923 835 176 124 416 × 2 = 0 + 0.000 000 000 103 563 136 237 847 670 352 248 832;
  • 24) 0.000 000 000 103 563 136 237 847 670 352 248 832 × 2 = 0 + 0.000 000 000 207 126 272 475 695 340 704 497 664;
  • 25) 0.000 000 000 207 126 272 475 695 340 704 497 664 × 2 = 0 + 0.000 000 000 414 252 544 951 390 681 408 995 328;
  • 26) 0.000 000 000 414 252 544 951 390 681 408 995 328 × 2 = 0 + 0.000 000 000 828 505 089 902 781 362 817 990 656;
  • 27) 0.000 000 000 828 505 089 902 781 362 817 990 656 × 2 = 0 + 0.000 000 001 657 010 179 805 562 725 635 981 312;
  • 28) 0.000 000 001 657 010 179 805 562 725 635 981 312 × 2 = 0 + 0.000 000 003 314 020 359 611 125 451 271 962 624;
  • 29) 0.000 000 003 314 020 359 611 125 451 271 962 624 × 2 = 0 + 0.000 000 006 628 040 719 222 250 902 543 925 248;
  • 30) 0.000 000 006 628 040 719 222 250 902 543 925 248 × 2 = 0 + 0.000 000 013 256 081 438 444 501 805 087 850 496;
  • 31) 0.000 000 013 256 081 438 444 501 805 087 850 496 × 2 = 0 + 0.000 000 026 512 162 876 889 003 610 175 700 992;
  • 32) 0.000 000 026 512 162 876 889 003 610 175 700 992 × 2 = 0 + 0.000 000 053 024 325 753 778 007 220 351 401 984;
  • 33) 0.000 000 053 024 325 753 778 007 220 351 401 984 × 2 = 0 + 0.000 000 106 048 651 507 556 014 440 702 803 968;
  • 34) 0.000 000 106 048 651 507 556 014 440 702 803 968 × 2 = 0 + 0.000 000 212 097 303 015 112 028 881 405 607 936;
  • 35) 0.000 000 212 097 303 015 112 028 881 405 607 936 × 2 = 0 + 0.000 000 424 194 606 030 224 057 762 811 215 872;
  • 36) 0.000 000 424 194 606 030 224 057 762 811 215 872 × 2 = 0 + 0.000 000 848 389 212 060 448 115 525 622 431 744;
  • 37) 0.000 000 848 389 212 060 448 115 525 622 431 744 × 2 = 0 + 0.000 001 696 778 424 120 896 231 051 244 863 488;
  • 38) 0.000 001 696 778 424 120 896 231 051 244 863 488 × 2 = 0 + 0.000 003 393 556 848 241 792 462 102 489 726 976;
  • 39) 0.000 003 393 556 848 241 792 462 102 489 726 976 × 2 = 0 + 0.000 006 787 113 696 483 584 924 204 979 453 952;
  • 40) 0.000 006 787 113 696 483 584 924 204 979 453 952 × 2 = 0 + 0.000 013 574 227 392 967 169 848 409 958 907 904;
  • 41) 0.000 013 574 227 392 967 169 848 409 958 907 904 × 2 = 0 + 0.000 027 148 454 785 934 339 696 819 917 815 808;
  • 42) 0.000 027 148 454 785 934 339 696 819 917 815 808 × 2 = 0 + 0.000 054 296 909 571 868 679 393 639 835 631 616;
  • 43) 0.000 054 296 909 571 868 679 393 639 835 631 616 × 2 = 0 + 0.000 108 593 819 143 737 358 787 279 671 263 232;
  • 44) 0.000 108 593 819 143 737 358 787 279 671 263 232 × 2 = 0 + 0.000 217 187 638 287 474 717 574 559 342 526 464;
  • 45) 0.000 217 187 638 287 474 717 574 559 342 526 464 × 2 = 0 + 0.000 434 375 276 574 949 435 149 118 685 052 928;
  • 46) 0.000 434 375 276 574 949 435 149 118 685 052 928 × 2 = 0 + 0.000 868 750 553 149 898 870 298 237 370 105 856;
  • 47) 0.000 868 750 553 149 898 870 298 237 370 105 856 × 2 = 0 + 0.001 737 501 106 299 797 740 596 474 740 211 712;
  • 48) 0.001 737 501 106 299 797 740 596 474 740 211 712 × 2 = 0 + 0.003 475 002 212 599 595 481 192 949 480 423 424;
  • 49) 0.003 475 002 212 599 595 481 192 949 480 423 424 × 2 = 0 + 0.006 950 004 425 199 190 962 385 898 960 846 848;
  • 50) 0.006 950 004 425 199 190 962 385 898 960 846 848 × 2 = 0 + 0.013 900 008 850 398 381 924 771 797 921 693 696;
  • 51) 0.013 900 008 850 398 381 924 771 797 921 693 696 × 2 = 0 + 0.027 800 017 700 796 763 849 543 595 843 387 392;
  • 52) 0.027 800 017 700 796 763 849 543 595 843 387 392 × 2 = 0 + 0.055 600 035 401 593 527 699 087 191 686 774 784;
  • 53) 0.055 600 035 401 593 527 699 087 191 686 774 784 × 2 = 0 + 0.111 200 070 803 187 055 398 174 383 373 549 568;
  • 54) 0.111 200 070 803 187 055 398 174 383 373 549 568 × 2 = 0 + 0.222 400 141 606 374 110 796 348 766 747 099 136;
  • 55) 0.222 400 141 606 374 110 796 348 766 747 099 136 × 2 = 0 + 0.444 800 283 212 748 221 592 697 533 494 198 272;
  • 56) 0.444 800 283 212 748 221 592 697 533 494 198 272 × 2 = 0 + 0.889 600 566 425 496 443 185 395 066 988 396 544;
  • 57) 0.889 600 566 425 496 443 185 395 066 988 396 544 × 2 = 1 + 0.779 201 132 850 992 886 370 790 133 976 793 088;
  • 58) 0.779 201 132 850 992 886 370 790 133 976 793 088 × 2 = 1 + 0.558 402 265 701 985 772 741 580 267 953 586 176;
  • 59) 0.558 402 265 701 985 772 741 580 267 953 586 176 × 2 = 1 + 0.116 804 531 403 971 545 483 160 535 907 172 352;
  • 60) 0.116 804 531 403 971 545 483 160 535 907 172 352 × 2 = 0 + 0.233 609 062 807 943 090 966 321 071 814 344 704;
  • 61) 0.233 609 062 807 943 090 966 321 071 814 344 704 × 2 = 0 + 0.467 218 125 615 886 181 932 642 143 628 689 408;
  • 62) 0.467 218 125 615 886 181 932 642 143 628 689 408 × 2 = 0 + 0.934 436 251 231 772 363 865 284 287 257 378 816;
  • 63) 0.934 436 251 231 772 363 865 284 287 257 378 816 × 2 = 1 + 0.868 872 502 463 544 727 730 568 574 514 757 632;
  • 64) 0.868 872 502 463 544 727 730 568 574 514 757 632 × 2 = 1 + 0.737 745 004 927 089 455 461 137 149 029 515 264;
  • 65) 0.737 745 004 927 089 455 461 137 149 029 515 264 × 2 = 1 + 0.475 490 009 854 178 910 922 274 298 059 030 528;
  • 66) 0.475 490 009 854 178 910 922 274 298 059 030 528 × 2 = 0 + 0.950 980 019 708 357 821 844 548 596 118 061 056;
  • 67) 0.950 980 019 708 357 821 844 548 596 118 061 056 × 2 = 1 + 0.901 960 039 416 715 643 689 097 192 236 122 112;
  • 68) 0.901 960 039 416 715 643 689 097 192 236 122 112 × 2 = 1 + 0.803 920 078 833 431 287 378 194 384 472 244 224;
  • 69) 0.803 920 078 833 431 287 378 194 384 472 244 224 × 2 = 1 + 0.607 840 157 666 862 574 756 388 768 944 488 448;
  • 70) 0.607 840 157 666 862 574 756 388 768 944 488 448 × 2 = 1 + 0.215 680 315 333 725 149 512 777 537 888 976 896;
  • 71) 0.215 680 315 333 725 149 512 777 537 888 976 896 × 2 = 0 + 0.431 360 630 667 450 299 025 555 075 777 953 792;
  • 72) 0.431 360 630 667 450 299 025 555 075 777 953 792 × 2 = 0 + 0.862 721 261 334 900 598 051 110 151 555 907 584;
  • 73) 0.862 721 261 334 900 598 051 110 151 555 907 584 × 2 = 1 + 0.725 442 522 669 801 196 102 220 303 111 815 168;
  • 74) 0.725 442 522 669 801 196 102 220 303 111 815 168 × 2 = 1 + 0.450 885 045 339 602 392 204 440 606 223 630 336;
  • 75) 0.450 885 045 339 602 392 204 440 606 223 630 336 × 2 = 0 + 0.901 770 090 679 204 784 408 881 212 447 260 672;
  • 76) 0.901 770 090 679 204 784 408 881 212 447 260 672 × 2 = 1 + 0.803 540 181 358 409 568 817 762 424 894 521 344;
  • 77) 0.803 540 181 358 409 568 817 762 424 894 521 344 × 2 = 1 + 0.607 080 362 716 819 137 635 524 849 789 042 688;
  • 78) 0.607 080 362 716 819 137 635 524 849 789 042 688 × 2 = 1 + 0.214 160 725 433 638 275 271 049 699 578 085 376;
  • 79) 0.214 160 725 433 638 275 271 049 699 578 085 376 × 2 = 0 + 0.428 321 450 867 276 550 542 099 399 156 170 752;
  • 80) 0.428 321 450 867 276 550 542 099 399 156 170 752 × 2 = 0 + 0.856 642 901 734 553 101 084 198 798 312 341 504;
  • 81) 0.856 642 901 734 553 101 084 198 798 312 341 504 × 2 = 1 + 0.713 285 803 469 106 202 168 397 596 624 683 008;
  • 82) 0.713 285 803 469 106 202 168 397 596 624 683 008 × 2 = 1 + 0.426 571 606 938 212 404 336 795 193 249 366 016;
  • 83) 0.426 571 606 938 212 404 336 795 193 249 366 016 × 2 = 0 + 0.853 143 213 876 424 808 673 590 386 498 732 032;
  • 84) 0.853 143 213 876 424 808 673 590 386 498 732 032 × 2 = 1 + 0.706 286 427 752 849 617 347 180 772 997 464 064;
  • 85) 0.706 286 427 752 849 617 347 180 772 997 464 064 × 2 = 1 + 0.412 572 855 505 699 234 694 361 545 994 928 128;
  • 86) 0.412 572 855 505 699 234 694 361 545 994 928 128 × 2 = 0 + 0.825 145 711 011 398 469 388 723 091 989 856 256;
  • 87) 0.825 145 711 011 398 469 388 723 091 989 856 256 × 2 = 1 + 0.650 291 422 022 796 938 777 446 183 979 712 512;
  • 88) 0.650 291 422 022 796 938 777 446 183 979 712 512 × 2 = 1 + 0.300 582 844 045 593 877 554 892 367 959 425 024;
  • 89) 0.300 582 844 045 593 877 554 892 367 959 425 024 × 2 = 0 + 0.601 165 688 091 187 755 109 784 735 918 850 048;
  • 90) 0.601 165 688 091 187 755 109 784 735 918 850 048 × 2 = 1 + 0.202 331 376 182 375 510 219 569 471 837 700 096;
  • 91) 0.202 331 376 182 375 510 219 569 471 837 700 096 × 2 = 0 + 0.404 662 752 364 751 020 439 138 943 675 400 192;
  • 92) 0.404 662 752 364 751 020 439 138 943 675 400 192 × 2 = 0 + 0.809 325 504 729 502 040 878 277 887 350 800 384;
  • 93) 0.809 325 504 729 502 040 878 277 887 350 800 384 × 2 = 1 + 0.618 651 009 459 004 081 756 555 774 701 600 768;
  • 94) 0.618 651 009 459 004 081 756 555 774 701 600 768 × 2 = 1 + 0.237 302 018 918 008 163 513 111 549 403 201 536;
  • 95) 0.237 302 018 918 008 163 513 111 549 403 201 536 × 2 = 0 + 0.474 604 037 836 016 327 026 223 098 806 403 072;
  • 96) 0.474 604 037 836 016 327 026 223 098 806 403 072 × 2 = 0 + 0.949 208 075 672 032 654 052 446 197 612 806 144;
  • 97) 0.949 208 075 672 032 654 052 446 197 612 806 144 × 2 = 1 + 0.898 416 151 344 065 308 104 892 395 225 612 288;
  • 98) 0.898 416 151 344 065 308 104 892 395 225 612 288 × 2 = 1 + 0.796 832 302 688 130 616 209 784 790 451 224 576;
  • 99) 0.796 832 302 688 130 616 209 784 790 451 224 576 × 2 = 1 + 0.593 664 605 376 261 232 419 569 580 902 449 152;
  • 100) 0.593 664 605 376 261 232 419 569 580 902 449 152 × 2 = 1 + 0.187 329 210 752 522 464 839 139 161 804 898 304;
  • 101) 0.187 329 210 752 522 464 839 139 161 804 898 304 × 2 = 0 + 0.374 658 421 505 044 929 678 278 323 609 796 608;
  • 102) 0.374 658 421 505 044 929 678 278 323 609 796 608 × 2 = 0 + 0.749 316 843 010 089 859 356 556 647 219 593 216;
  • 103) 0.749 316 843 010 089 859 356 556 647 219 593 216 × 2 = 1 + 0.498 633 686 020 179 718 713 113 294 439 186 432;
  • 104) 0.498 633 686 020 179 718 713 113 294 439 186 432 × 2 = 0 + 0.997 267 372 040 359 437 426 226 588 878 372 864;
  • 105) 0.997 267 372 040 359 437 426 226 588 878 372 864 × 2 = 1 + 0.994 534 744 080 718 874 852 453 177 756 745 728;
  • 106) 0.994 534 744 080 718 874 852 453 177 756 745 728 × 2 = 1 + 0.989 069 488 161 437 749 704 906 355 513 491 456;
  • 107) 0.989 069 488 161 437 749 704 906 355 513 491 456 × 2 = 1 + 0.978 138 976 322 875 499 409 812 711 026 982 912;
  • 108) 0.978 138 976 322 875 499 409 812 711 026 982 912 × 2 = 1 + 0.956 277 952 645 750 998 819 625 422 053 965 824;
  • 109) 0.956 277 952 645 750 998 819 625 422 053 965 824 × 2 = 1 + 0.912 555 905 291 501 997 639 250 844 107 931 648;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 012 345 687 894 564 589 304(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1110 0011 1011 1100 1101 1100 1101 1011 0100 1100 1111 0010 1111 1(2)

5. Positive number before normalization:

0.000 000 000 000 000 012 345 687 894 564 589 304(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1110 0011 1011 1100 1101 1100 1101 1011 0100 1100 1111 0010 1111 1(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 57 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 012 345 687 894 564 589 304(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1110 0011 1011 1100 1101 1100 1101 1011 0100 1100 1111 0010 1111 1(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1110 0011 1011 1100 1101 1100 1101 1011 0100 1100 1111 0010 1111 1(2) × 20 =


1.1100 0111 0111 1001 1011 1001 1011 0110 1001 1001 1110 0101 1111(2) × 2-57


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -57


Mantissa (not normalized):
1.1100 0111 0111 1001 1011 1001 1011 0110 1001 1001 1110 0101 1111


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-57 + 2(11-1) - 1 =


(-57 + 1 023)(10) =


966(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 966 ÷ 2 = 483 + 0;
  • 483 ÷ 2 = 241 + 1;
  • 241 ÷ 2 = 120 + 1;
  • 120 ÷ 2 = 60 + 0;
  • 60 ÷ 2 = 30 + 0;
  • 30 ÷ 2 = 15 + 0;
  • 15 ÷ 2 = 7 + 1;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


966(10) =


011 1100 0110(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 1100 0111 0111 1001 1011 1001 1011 0110 1001 1001 1110 0101 1111 =


1100 0111 0111 1001 1011 1001 1011 0110 1001 1001 1110 0101 1111


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1100 0110


Mantissa (52 bits) =
1100 0111 0111 1001 1011 1001 1011 0110 1001 1001 1110 0101 1111


Decimal number 0.000 000 000 000 000 012 345 687 894 564 589 304 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1100 0110 - 1100 0111 0111 1001 1011 1001 1011 0110 1001 1001 1110 0101 1111


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100