0.000 000 000 000 000 012 345 687 894 564 589 239 Converted to 64 Bit Double Precision IEEE 754 Binary Floating Point Representation Standard

Convert decimal 0.000 000 000 000 000 012 345 687 894 564 589 239(10) to 64 bit double precision IEEE 754 binary floating point representation standard (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

What are the steps to convert decimal number
0.000 000 000 000 000 012 345 687 894 564 589 239(10) to 64 bit double precision IEEE 754 binary floating point representation (1 bit for sign, 11 bits for exponent, 52 bits for mantissa)

1. First, convert to binary (in base 2) the integer part: 0.
Divide the number repeatedly by 2.

Keep track of each remainder.

We stop when we get a quotient that is equal to zero.


  • division = quotient + remainder;
  • 0 ÷ 2 = 0 + 0;

2. Construct the base 2 representation of the integer part of the number.

Take all the remainders starting from the bottom of the list constructed above.

0(10) =


0(2)


3. Convert to binary (base 2) the fractional part: 0.000 000 000 000 000 012 345 687 894 564 589 239.

Multiply it repeatedly by 2.


Keep track of each integer part of the results.


Stop when we get a fractional part that is equal to zero.


  • #) multiplying = integer + fractional part;
  • 1) 0.000 000 000 000 000 012 345 687 894 564 589 239 × 2 = 0 + 0.000 000 000 000 000 024 691 375 789 129 178 478;
  • 2) 0.000 000 000 000 000 024 691 375 789 129 178 478 × 2 = 0 + 0.000 000 000 000 000 049 382 751 578 258 356 956;
  • 3) 0.000 000 000 000 000 049 382 751 578 258 356 956 × 2 = 0 + 0.000 000 000 000 000 098 765 503 156 516 713 912;
  • 4) 0.000 000 000 000 000 098 765 503 156 516 713 912 × 2 = 0 + 0.000 000 000 000 000 197 531 006 313 033 427 824;
  • 5) 0.000 000 000 000 000 197 531 006 313 033 427 824 × 2 = 0 + 0.000 000 000 000 000 395 062 012 626 066 855 648;
  • 6) 0.000 000 000 000 000 395 062 012 626 066 855 648 × 2 = 0 + 0.000 000 000 000 000 790 124 025 252 133 711 296;
  • 7) 0.000 000 000 000 000 790 124 025 252 133 711 296 × 2 = 0 + 0.000 000 000 000 001 580 248 050 504 267 422 592;
  • 8) 0.000 000 000 000 001 580 248 050 504 267 422 592 × 2 = 0 + 0.000 000 000 000 003 160 496 101 008 534 845 184;
  • 9) 0.000 000 000 000 003 160 496 101 008 534 845 184 × 2 = 0 + 0.000 000 000 000 006 320 992 202 017 069 690 368;
  • 10) 0.000 000 000 000 006 320 992 202 017 069 690 368 × 2 = 0 + 0.000 000 000 000 012 641 984 404 034 139 380 736;
  • 11) 0.000 000 000 000 012 641 984 404 034 139 380 736 × 2 = 0 + 0.000 000 000 000 025 283 968 808 068 278 761 472;
  • 12) 0.000 000 000 000 025 283 968 808 068 278 761 472 × 2 = 0 + 0.000 000 000 000 050 567 937 616 136 557 522 944;
  • 13) 0.000 000 000 000 050 567 937 616 136 557 522 944 × 2 = 0 + 0.000 000 000 000 101 135 875 232 273 115 045 888;
  • 14) 0.000 000 000 000 101 135 875 232 273 115 045 888 × 2 = 0 + 0.000 000 000 000 202 271 750 464 546 230 091 776;
  • 15) 0.000 000 000 000 202 271 750 464 546 230 091 776 × 2 = 0 + 0.000 000 000 000 404 543 500 929 092 460 183 552;
  • 16) 0.000 000 000 000 404 543 500 929 092 460 183 552 × 2 = 0 + 0.000 000 000 000 809 087 001 858 184 920 367 104;
  • 17) 0.000 000 000 000 809 087 001 858 184 920 367 104 × 2 = 0 + 0.000 000 000 001 618 174 003 716 369 840 734 208;
  • 18) 0.000 000 000 001 618 174 003 716 369 840 734 208 × 2 = 0 + 0.000 000 000 003 236 348 007 432 739 681 468 416;
  • 19) 0.000 000 000 003 236 348 007 432 739 681 468 416 × 2 = 0 + 0.000 000 000 006 472 696 014 865 479 362 936 832;
  • 20) 0.000 000 000 006 472 696 014 865 479 362 936 832 × 2 = 0 + 0.000 000 000 012 945 392 029 730 958 725 873 664;
  • 21) 0.000 000 000 012 945 392 029 730 958 725 873 664 × 2 = 0 + 0.000 000 000 025 890 784 059 461 917 451 747 328;
  • 22) 0.000 000 000 025 890 784 059 461 917 451 747 328 × 2 = 0 + 0.000 000 000 051 781 568 118 923 834 903 494 656;
  • 23) 0.000 000 000 051 781 568 118 923 834 903 494 656 × 2 = 0 + 0.000 000 000 103 563 136 237 847 669 806 989 312;
  • 24) 0.000 000 000 103 563 136 237 847 669 806 989 312 × 2 = 0 + 0.000 000 000 207 126 272 475 695 339 613 978 624;
  • 25) 0.000 000 000 207 126 272 475 695 339 613 978 624 × 2 = 0 + 0.000 000 000 414 252 544 951 390 679 227 957 248;
  • 26) 0.000 000 000 414 252 544 951 390 679 227 957 248 × 2 = 0 + 0.000 000 000 828 505 089 902 781 358 455 914 496;
  • 27) 0.000 000 000 828 505 089 902 781 358 455 914 496 × 2 = 0 + 0.000 000 001 657 010 179 805 562 716 911 828 992;
  • 28) 0.000 000 001 657 010 179 805 562 716 911 828 992 × 2 = 0 + 0.000 000 003 314 020 359 611 125 433 823 657 984;
  • 29) 0.000 000 003 314 020 359 611 125 433 823 657 984 × 2 = 0 + 0.000 000 006 628 040 719 222 250 867 647 315 968;
  • 30) 0.000 000 006 628 040 719 222 250 867 647 315 968 × 2 = 0 + 0.000 000 013 256 081 438 444 501 735 294 631 936;
  • 31) 0.000 000 013 256 081 438 444 501 735 294 631 936 × 2 = 0 + 0.000 000 026 512 162 876 889 003 470 589 263 872;
  • 32) 0.000 000 026 512 162 876 889 003 470 589 263 872 × 2 = 0 + 0.000 000 053 024 325 753 778 006 941 178 527 744;
  • 33) 0.000 000 053 024 325 753 778 006 941 178 527 744 × 2 = 0 + 0.000 000 106 048 651 507 556 013 882 357 055 488;
  • 34) 0.000 000 106 048 651 507 556 013 882 357 055 488 × 2 = 0 + 0.000 000 212 097 303 015 112 027 764 714 110 976;
  • 35) 0.000 000 212 097 303 015 112 027 764 714 110 976 × 2 = 0 + 0.000 000 424 194 606 030 224 055 529 428 221 952;
  • 36) 0.000 000 424 194 606 030 224 055 529 428 221 952 × 2 = 0 + 0.000 000 848 389 212 060 448 111 058 856 443 904;
  • 37) 0.000 000 848 389 212 060 448 111 058 856 443 904 × 2 = 0 + 0.000 001 696 778 424 120 896 222 117 712 887 808;
  • 38) 0.000 001 696 778 424 120 896 222 117 712 887 808 × 2 = 0 + 0.000 003 393 556 848 241 792 444 235 425 775 616;
  • 39) 0.000 003 393 556 848 241 792 444 235 425 775 616 × 2 = 0 + 0.000 006 787 113 696 483 584 888 470 851 551 232;
  • 40) 0.000 006 787 113 696 483 584 888 470 851 551 232 × 2 = 0 + 0.000 013 574 227 392 967 169 776 941 703 102 464;
  • 41) 0.000 013 574 227 392 967 169 776 941 703 102 464 × 2 = 0 + 0.000 027 148 454 785 934 339 553 883 406 204 928;
  • 42) 0.000 027 148 454 785 934 339 553 883 406 204 928 × 2 = 0 + 0.000 054 296 909 571 868 679 107 766 812 409 856;
  • 43) 0.000 054 296 909 571 868 679 107 766 812 409 856 × 2 = 0 + 0.000 108 593 819 143 737 358 215 533 624 819 712;
  • 44) 0.000 108 593 819 143 737 358 215 533 624 819 712 × 2 = 0 + 0.000 217 187 638 287 474 716 431 067 249 639 424;
  • 45) 0.000 217 187 638 287 474 716 431 067 249 639 424 × 2 = 0 + 0.000 434 375 276 574 949 432 862 134 499 278 848;
  • 46) 0.000 434 375 276 574 949 432 862 134 499 278 848 × 2 = 0 + 0.000 868 750 553 149 898 865 724 268 998 557 696;
  • 47) 0.000 868 750 553 149 898 865 724 268 998 557 696 × 2 = 0 + 0.001 737 501 106 299 797 731 448 537 997 115 392;
  • 48) 0.001 737 501 106 299 797 731 448 537 997 115 392 × 2 = 0 + 0.003 475 002 212 599 595 462 897 075 994 230 784;
  • 49) 0.003 475 002 212 599 595 462 897 075 994 230 784 × 2 = 0 + 0.006 950 004 425 199 190 925 794 151 988 461 568;
  • 50) 0.006 950 004 425 199 190 925 794 151 988 461 568 × 2 = 0 + 0.013 900 008 850 398 381 851 588 303 976 923 136;
  • 51) 0.013 900 008 850 398 381 851 588 303 976 923 136 × 2 = 0 + 0.027 800 017 700 796 763 703 176 607 953 846 272;
  • 52) 0.027 800 017 700 796 763 703 176 607 953 846 272 × 2 = 0 + 0.055 600 035 401 593 527 406 353 215 907 692 544;
  • 53) 0.055 600 035 401 593 527 406 353 215 907 692 544 × 2 = 0 + 0.111 200 070 803 187 054 812 706 431 815 385 088;
  • 54) 0.111 200 070 803 187 054 812 706 431 815 385 088 × 2 = 0 + 0.222 400 141 606 374 109 625 412 863 630 770 176;
  • 55) 0.222 400 141 606 374 109 625 412 863 630 770 176 × 2 = 0 + 0.444 800 283 212 748 219 250 825 727 261 540 352;
  • 56) 0.444 800 283 212 748 219 250 825 727 261 540 352 × 2 = 0 + 0.889 600 566 425 496 438 501 651 454 523 080 704;
  • 57) 0.889 600 566 425 496 438 501 651 454 523 080 704 × 2 = 1 + 0.779 201 132 850 992 877 003 302 909 046 161 408;
  • 58) 0.779 201 132 850 992 877 003 302 909 046 161 408 × 2 = 1 + 0.558 402 265 701 985 754 006 605 818 092 322 816;
  • 59) 0.558 402 265 701 985 754 006 605 818 092 322 816 × 2 = 1 + 0.116 804 531 403 971 508 013 211 636 184 645 632;
  • 60) 0.116 804 531 403 971 508 013 211 636 184 645 632 × 2 = 0 + 0.233 609 062 807 943 016 026 423 272 369 291 264;
  • 61) 0.233 609 062 807 943 016 026 423 272 369 291 264 × 2 = 0 + 0.467 218 125 615 886 032 052 846 544 738 582 528;
  • 62) 0.467 218 125 615 886 032 052 846 544 738 582 528 × 2 = 0 + 0.934 436 251 231 772 064 105 693 089 477 165 056;
  • 63) 0.934 436 251 231 772 064 105 693 089 477 165 056 × 2 = 1 + 0.868 872 502 463 544 128 211 386 178 954 330 112;
  • 64) 0.868 872 502 463 544 128 211 386 178 954 330 112 × 2 = 1 + 0.737 745 004 927 088 256 422 772 357 908 660 224;
  • 65) 0.737 745 004 927 088 256 422 772 357 908 660 224 × 2 = 1 + 0.475 490 009 854 176 512 845 544 715 817 320 448;
  • 66) 0.475 490 009 854 176 512 845 544 715 817 320 448 × 2 = 0 + 0.950 980 019 708 353 025 691 089 431 634 640 896;
  • 67) 0.950 980 019 708 353 025 691 089 431 634 640 896 × 2 = 1 + 0.901 960 039 416 706 051 382 178 863 269 281 792;
  • 68) 0.901 960 039 416 706 051 382 178 863 269 281 792 × 2 = 1 + 0.803 920 078 833 412 102 764 357 726 538 563 584;
  • 69) 0.803 920 078 833 412 102 764 357 726 538 563 584 × 2 = 1 + 0.607 840 157 666 824 205 528 715 453 077 127 168;
  • 70) 0.607 840 157 666 824 205 528 715 453 077 127 168 × 2 = 1 + 0.215 680 315 333 648 411 057 430 906 154 254 336;
  • 71) 0.215 680 315 333 648 411 057 430 906 154 254 336 × 2 = 0 + 0.431 360 630 667 296 822 114 861 812 308 508 672;
  • 72) 0.431 360 630 667 296 822 114 861 812 308 508 672 × 2 = 0 + 0.862 721 261 334 593 644 229 723 624 617 017 344;
  • 73) 0.862 721 261 334 593 644 229 723 624 617 017 344 × 2 = 1 + 0.725 442 522 669 187 288 459 447 249 234 034 688;
  • 74) 0.725 442 522 669 187 288 459 447 249 234 034 688 × 2 = 1 + 0.450 885 045 338 374 576 918 894 498 468 069 376;
  • 75) 0.450 885 045 338 374 576 918 894 498 468 069 376 × 2 = 0 + 0.901 770 090 676 749 153 837 788 996 936 138 752;
  • 76) 0.901 770 090 676 749 153 837 788 996 936 138 752 × 2 = 1 + 0.803 540 181 353 498 307 675 577 993 872 277 504;
  • 77) 0.803 540 181 353 498 307 675 577 993 872 277 504 × 2 = 1 + 0.607 080 362 706 996 615 351 155 987 744 555 008;
  • 78) 0.607 080 362 706 996 615 351 155 987 744 555 008 × 2 = 1 + 0.214 160 725 413 993 230 702 311 975 489 110 016;
  • 79) 0.214 160 725 413 993 230 702 311 975 489 110 016 × 2 = 0 + 0.428 321 450 827 986 461 404 623 950 978 220 032;
  • 80) 0.428 321 450 827 986 461 404 623 950 978 220 032 × 2 = 0 + 0.856 642 901 655 972 922 809 247 901 956 440 064;
  • 81) 0.856 642 901 655 972 922 809 247 901 956 440 064 × 2 = 1 + 0.713 285 803 311 945 845 618 495 803 912 880 128;
  • 82) 0.713 285 803 311 945 845 618 495 803 912 880 128 × 2 = 1 + 0.426 571 606 623 891 691 236 991 607 825 760 256;
  • 83) 0.426 571 606 623 891 691 236 991 607 825 760 256 × 2 = 0 + 0.853 143 213 247 783 382 473 983 215 651 520 512;
  • 84) 0.853 143 213 247 783 382 473 983 215 651 520 512 × 2 = 1 + 0.706 286 426 495 566 764 947 966 431 303 041 024;
  • 85) 0.706 286 426 495 566 764 947 966 431 303 041 024 × 2 = 1 + 0.412 572 852 991 133 529 895 932 862 606 082 048;
  • 86) 0.412 572 852 991 133 529 895 932 862 606 082 048 × 2 = 0 + 0.825 145 705 982 267 059 791 865 725 212 164 096;
  • 87) 0.825 145 705 982 267 059 791 865 725 212 164 096 × 2 = 1 + 0.650 291 411 964 534 119 583 731 450 424 328 192;
  • 88) 0.650 291 411 964 534 119 583 731 450 424 328 192 × 2 = 1 + 0.300 582 823 929 068 239 167 462 900 848 656 384;
  • 89) 0.300 582 823 929 068 239 167 462 900 848 656 384 × 2 = 0 + 0.601 165 647 858 136 478 334 925 801 697 312 768;
  • 90) 0.601 165 647 858 136 478 334 925 801 697 312 768 × 2 = 1 + 0.202 331 295 716 272 956 669 851 603 394 625 536;
  • 91) 0.202 331 295 716 272 956 669 851 603 394 625 536 × 2 = 0 + 0.404 662 591 432 545 913 339 703 206 789 251 072;
  • 92) 0.404 662 591 432 545 913 339 703 206 789 251 072 × 2 = 0 + 0.809 325 182 865 091 826 679 406 413 578 502 144;
  • 93) 0.809 325 182 865 091 826 679 406 413 578 502 144 × 2 = 1 + 0.618 650 365 730 183 653 358 812 827 157 004 288;
  • 94) 0.618 650 365 730 183 653 358 812 827 157 004 288 × 2 = 1 + 0.237 300 731 460 367 306 717 625 654 314 008 576;
  • 95) 0.237 300 731 460 367 306 717 625 654 314 008 576 × 2 = 0 + 0.474 601 462 920 734 613 435 251 308 628 017 152;
  • 96) 0.474 601 462 920 734 613 435 251 308 628 017 152 × 2 = 0 + 0.949 202 925 841 469 226 870 502 617 256 034 304;
  • 97) 0.949 202 925 841 469 226 870 502 617 256 034 304 × 2 = 1 + 0.898 405 851 682 938 453 741 005 234 512 068 608;
  • 98) 0.898 405 851 682 938 453 741 005 234 512 068 608 × 2 = 1 + 0.796 811 703 365 876 907 482 010 469 024 137 216;
  • 99) 0.796 811 703 365 876 907 482 010 469 024 137 216 × 2 = 1 + 0.593 623 406 731 753 814 964 020 938 048 274 432;
  • 100) 0.593 623 406 731 753 814 964 020 938 048 274 432 × 2 = 1 + 0.187 246 813 463 507 629 928 041 876 096 548 864;
  • 101) 0.187 246 813 463 507 629 928 041 876 096 548 864 × 2 = 0 + 0.374 493 626 927 015 259 856 083 752 193 097 728;
  • 102) 0.374 493 626 927 015 259 856 083 752 193 097 728 × 2 = 0 + 0.748 987 253 854 030 519 712 167 504 386 195 456;
  • 103) 0.748 987 253 854 030 519 712 167 504 386 195 456 × 2 = 1 + 0.497 974 507 708 061 039 424 335 008 772 390 912;
  • 104) 0.497 974 507 708 061 039 424 335 008 772 390 912 × 2 = 0 + 0.995 949 015 416 122 078 848 670 017 544 781 824;
  • 105) 0.995 949 015 416 122 078 848 670 017 544 781 824 × 2 = 1 + 0.991 898 030 832 244 157 697 340 035 089 563 648;
  • 106) 0.991 898 030 832 244 157 697 340 035 089 563 648 × 2 = 1 + 0.983 796 061 664 488 315 394 680 070 179 127 296;
  • 107) 0.983 796 061 664 488 315 394 680 070 179 127 296 × 2 = 1 + 0.967 592 123 328 976 630 789 360 140 358 254 592;
  • 108) 0.967 592 123 328 976 630 789 360 140 358 254 592 × 2 = 1 + 0.935 184 246 657 953 261 578 720 280 716 509 184;
  • 109) 0.935 184 246 657 953 261 578 720 280 716 509 184 × 2 = 1 + 0.870 368 493 315 906 523 157 440 561 433 018 368;

We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit) and at least one integer that was different from zero => FULL STOP (Losing precision - the converted number we get in the end will be just a very good approximation of the initial one).


4. Construct the base 2 representation of the fractional part of the number.

Take all the integer parts of the multiplying operations, starting from the top of the constructed list above:


0.000 000 000 000 000 012 345 687 894 564 589 239(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1110 0011 1011 1100 1101 1100 1101 1011 0100 1100 1111 0010 1111 1(2)

5. Positive number before normalization:

0.000 000 000 000 000 012 345 687 894 564 589 239(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1110 0011 1011 1100 1101 1100 1101 1011 0100 1100 1111 0010 1111 1(2)

6. Normalize the binary representation of the number.

Shift the decimal mark 57 positions to the right, so that only one non zero digit remains to the left of it:


0.000 000 000 000 000 012 345 687 894 564 589 239(10) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1110 0011 1011 1100 1101 1100 1101 1011 0100 1100 1111 0010 1111 1(2) =


0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1110 0011 1011 1100 1101 1100 1101 1011 0100 1100 1111 0010 1111 1(2) × 20 =


1.1100 0111 0111 1001 1011 1001 1011 0110 1001 1001 1110 0101 1111(2) × 2-57


7. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

Sign 0 (a positive number)


Exponent (unadjusted): -57


Mantissa (not normalized):
1.1100 0111 0111 1001 1011 1001 1011 0110 1001 1001 1110 0101 1111


8. Adjust the exponent.

Use the 11 bit excess/bias notation:


Exponent (adjusted) =


Exponent (unadjusted) + 2(11-1) - 1 =


-57 + 2(11-1) - 1 =


(-57 + 1 023)(10) =


966(10)


9. Convert the adjusted exponent from the decimal (base 10) to 11 bit binary.

Use the same technique of repeatedly dividing by 2:


  • division = quotient + remainder;
  • 966 ÷ 2 = 483 + 0;
  • 483 ÷ 2 = 241 + 1;
  • 241 ÷ 2 = 120 + 1;
  • 120 ÷ 2 = 60 + 0;
  • 60 ÷ 2 = 30 + 0;
  • 30 ÷ 2 = 15 + 0;
  • 15 ÷ 2 = 7 + 1;
  • 7 ÷ 2 = 3 + 1;
  • 3 ÷ 2 = 1 + 1;
  • 1 ÷ 2 = 0 + 1;

10. Construct the base 2 representation of the adjusted exponent.

Take all the remainders starting from the bottom of the list constructed above.


Exponent (adjusted) =


966(10) =


011 1100 0110(2)


11. Normalize the mantissa.

a) Remove the leading (the leftmost) bit, since it's allways 1, and the decimal point, if the case.


b) Adjust its length to 52 bits, only if necessary (not the case here).


Mantissa (normalized) =


1. 1100 0111 0111 1001 1011 1001 1011 0110 1001 1001 1110 0101 1111 =


1100 0111 0111 1001 1011 1001 1011 0110 1001 1001 1110 0101 1111


12. The three elements that make up the number's 64 bit double precision IEEE 754 binary floating point representation:

Sign (1 bit) =
0 (a positive number)


Exponent (11 bits) =
011 1100 0110


Mantissa (52 bits) =
1100 0111 0111 1001 1011 1001 1011 0110 1001 1001 1110 0101 1111


Decimal number 0.000 000 000 000 000 012 345 687 894 564 589 239 converted to 64 bit double precision IEEE 754 binary floating point representation:

0 - 011 1100 0110 - 1100 0111 0111 1001 1011 1001 1011 0110 1001 1001 1110 0101 1111


How to convert numbers from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point standard

Follow the steps below to convert a base 10 decimal number to 64 bit double precision IEEE 754 binary floating point:

  • 1. If the number to be converted is negative, start with its the positive version.
  • 2. First convert the integer part. Divide repeatedly by 2 the positive representation of the integer number that is to be converted to binary, until we get a quotient that is equal to zero, keeping track of each remainder.
  • 3. Construct the base 2 representation of the positive integer part of the number, by taking all the remainders from the previous operations, starting from the bottom of the list constructed above. Thus, the last remainder of the divisions becomes the first symbol (the leftmost) of the base two number, while the first remainder becomes the last symbol (the rightmost).
  • 4. Then convert the fractional part. Multiply the number repeatedly by 2, until we get a fractional part that is equal to zero, keeping track of each integer part of the results.
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the multiplying operations, starting from the top of the list constructed above (they should appear in the binary representation, from left to right, in the order they have been calculated).
  • 6. Normalize the binary representation of the number, shifting the decimal mark (the decimal point) "n" positions either to the left, or to the right, so that only one non zero digit remains to the left of the decimal mark.
  • 7. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary, by using the same technique of repeatedly dividing by 2, as shown above:
    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1
  • 8. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal mark, if the case) and adjust its length to 52 bits, either by removing the excess bits from the right (losing precision...) or by adding extra bits set on '0' to the right.
  • 9. Sign (it takes 1 bit) is either 1 for a negative or 0 for a positive number.

Example: convert the negative number -31.640 215 from the decimal system (base ten) to 64 bit double precision IEEE 754 binary floating point:

  • 1. Start with the positive version of the number:

    |-31.640 215| = 31.640 215

  • 2. First convert the integer part, 31. Divide it repeatedly by 2, keeping track of each remainder, until we get a quotient that is equal to zero:
    • division = quotient + remainder;
    • 31 ÷ 2 = 15 + 1;
    • 15 ÷ 2 = 7 + 1;
    • 7 ÷ 2 = 3 + 1;
    • 3 ÷ 2 = 1 + 1;
    • 1 ÷ 2 = 0 + 1;
    • We have encountered a quotient that is ZERO => FULL STOP
  • 3. Construct the base 2 representation of the integer part of the number by taking all the remainders of the previous dividing operations, starting from the bottom of the list constructed above:

    31(10) = 1 1111(2)

  • 4. Then, convert the fractional part, 0.640 215. Multiply repeatedly by 2, keeping track of each integer part of the results, until we get a fractional part that is equal to zero:
    • #) multiplying = integer + fractional part;
    • 1) 0.640 215 × 2 = 1 + 0.280 43;
    • 2) 0.280 43 × 2 = 0 + 0.560 86;
    • 3) 0.560 86 × 2 = 1 + 0.121 72;
    • 4) 0.121 72 × 2 = 0 + 0.243 44;
    • 5) 0.243 44 × 2 = 0 + 0.486 88;
    • 6) 0.486 88 × 2 = 0 + 0.973 76;
    • 7) 0.973 76 × 2 = 1 + 0.947 52;
    • 8) 0.947 52 × 2 = 1 + 0.895 04;
    • 9) 0.895 04 × 2 = 1 + 0.790 08;
    • 10) 0.790 08 × 2 = 1 + 0.580 16;
    • 11) 0.580 16 × 2 = 1 + 0.160 32;
    • 12) 0.160 32 × 2 = 0 + 0.320 64;
    • 13) 0.320 64 × 2 = 0 + 0.641 28;
    • 14) 0.641 28 × 2 = 1 + 0.282 56;
    • 15) 0.282 56 × 2 = 0 + 0.565 12;
    • 16) 0.565 12 × 2 = 1 + 0.130 24;
    • 17) 0.130 24 × 2 = 0 + 0.260 48;
    • 18) 0.260 48 × 2 = 0 + 0.520 96;
    • 19) 0.520 96 × 2 = 1 + 0.041 92;
    • 20) 0.041 92 × 2 = 0 + 0.083 84;
    • 21) 0.083 84 × 2 = 0 + 0.167 68;
    • 22) 0.167 68 × 2 = 0 + 0.335 36;
    • 23) 0.335 36 × 2 = 0 + 0.670 72;
    • 24) 0.670 72 × 2 = 1 + 0.341 44;
    • 25) 0.341 44 × 2 = 0 + 0.682 88;
    • 26) 0.682 88 × 2 = 1 + 0.365 76;
    • 27) 0.365 76 × 2 = 0 + 0.731 52;
    • 28) 0.731 52 × 2 = 1 + 0.463 04;
    • 29) 0.463 04 × 2 = 0 + 0.926 08;
    • 30) 0.926 08 × 2 = 1 + 0.852 16;
    • 31) 0.852 16 × 2 = 1 + 0.704 32;
    • 32) 0.704 32 × 2 = 1 + 0.408 64;
    • 33) 0.408 64 × 2 = 0 + 0.817 28;
    • 34) 0.817 28 × 2 = 1 + 0.634 56;
    • 35) 0.634 56 × 2 = 1 + 0.269 12;
    • 36) 0.269 12 × 2 = 0 + 0.538 24;
    • 37) 0.538 24 × 2 = 1 + 0.076 48;
    • 38) 0.076 48 × 2 = 0 + 0.152 96;
    • 39) 0.152 96 × 2 = 0 + 0.305 92;
    • 40) 0.305 92 × 2 = 0 + 0.611 84;
    • 41) 0.611 84 × 2 = 1 + 0.223 68;
    • 42) 0.223 68 × 2 = 0 + 0.447 36;
    • 43) 0.447 36 × 2 = 0 + 0.894 72;
    • 44) 0.894 72 × 2 = 1 + 0.789 44;
    • 45) 0.789 44 × 2 = 1 + 0.578 88;
    • 46) 0.578 88 × 2 = 1 + 0.157 76;
    • 47) 0.157 76 × 2 = 0 + 0.315 52;
    • 48) 0.315 52 × 2 = 0 + 0.631 04;
    • 49) 0.631 04 × 2 = 1 + 0.262 08;
    • 50) 0.262 08 × 2 = 0 + 0.524 16;
    • 51) 0.524 16 × 2 = 1 + 0.048 32;
    • 52) 0.048 32 × 2 = 0 + 0.096 64;
    • 53) 0.096 64 × 2 = 0 + 0.193 28;
    • We didn't get any fractional part that was equal to zero. But we had enough iterations (over Mantissa limit = 52) and at least one integer part that was different from zero => FULL STOP (losing precision...).
  • 5. Construct the base 2 representation of the fractional part of the number, by taking all the integer parts of the previous multiplying operations, starting from the top of the constructed list above:

    0.640 215(10) = 0.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 6. Summarizing - the positive number before normalization:

    31.640 215(10) = 1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2)

  • 7. Normalize the binary representation of the number, shifting the decimal mark 4 positions to the left so that only one non-zero digit stays to the left of the decimal mark:

    31.640 215(10) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) =
    1 1111.1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 20 =
    1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0(2) × 24

  • 8. Up to this moment, there are the following elements that would feed into the 64 bit double precision IEEE 754 binary floating point representation:

    Sign: 1 (a negative number)

    Exponent (unadjusted): 4

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

  • 9. Adjust the exponent in 11 bit excess/bias notation and then convert it from decimal (base 10) to 11 bit binary (base 2), by using the same technique of repeatedly dividing it by 2, as shown above:

    Exponent (adjusted) = Exponent (unadjusted) + 2(11-1) - 1 = (4 + 1023)(10) = 1027(10) =
    100 0000 0011(2)

  • 10. Normalize mantissa, remove the leading (leftmost) bit, since it's allways '1' (and the decimal sign) and adjust its length to 52 bits, by removing the excess bits, from the right (losing precision...):

    Mantissa (not-normalized): 1.1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100 1010 0

    Mantissa (normalized): 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Conclusion:

    Sign (1 bit) = 1 (a negative number)

    Exponent (8 bits) = 100 0000 0011

    Mantissa (52 bits) = 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100

  • Number -31.640 215, converted from decimal system (base 10) to 64 bit double precision IEEE 754 binary floating point =
    1 - 100 0000 0011 - 1111 1010 0011 1110 0101 0010 0001 0101 0111 0110 1000 1001 1100