Catch and compute overflow during multiplication of two large integers Catch and compute overflow during multiplication of two large integers c c

Catch and compute overflow during multiplication of two large integers


1. Detecting the overflow:

x = a * b;if (a != 0 && x / a != b) {    // overflow handling}

Edit: Fixed division by 0 (thanks Mark!)

2. Computing the carry is quite involved. One approach is to split both operands into half-words, then apply long multiplication to the half-words:

uint64_t hi(uint64_t x) {    return x >> 32;}uint64_t lo(uint64_t x) {    return ((1ULL << 32) - 1) & x;}void multiply(uint64_t a, uint64_t b) {    // actually uint32_t would do, but the casting is annoying    uint64_t s0, s1, s2, s3;         uint64_t x = lo(a) * lo(b);    s0 = lo(x);        x = hi(a) * lo(b) + hi(x);    s1 = lo(x);    s2 = hi(x);        x = s1 + lo(a) * hi(b);    s1 = lo(x);        x = s2 + hi(a) * hi(b) + hi(x);    s2 = lo(x);    s3 = hi(x);        uint64_t result = s1 << 32 | s0;    uint64_t carry = s3 << 32 | s2;}

To see that none of the partial sums themselves can overflow, we consider the worst case:

        x = s2 + hi(a) * hi(b) + hi(x)

Let B = 1 << 32. We then have

            x <= (B - 1) + (B - 1)(B - 1) + (B - 1)              <= B*B - 1               < B*B

I believe this will work - at least it handles Sjlver's test case. Aside from that, it is untested (and might not even compile, as I don't have a C++ compiler at hand anymore).


The idea is to use following fact which is true for integral operation:

a*b > c if and only if a > c/b

/ is integral division here.

The pseudocode to check against overflow for positive numbers follows:

if (a > max_int64 / b) then "overflow" else "ok".

To handle zeroes and negative numbers you should add more checks.

C code for non-negative a and b follows:

if (b > 0 && a > 18446744073709551615 / b) {     // overflow handling}; else {    c = a * b;}

Note:

18446744073709551615 == (1<<64)-1

To calculate carry we can use approach to split number into two 32-digits and multiply them as we do this on the paper. We need to split numbers to avoid overflow.

Code follows:

// split input numbers into 32-bit digitsuint64_t a0 = a & ((1LL<<32)-1);uint64_t a1 = a >> 32;uint64_t b0 = b & ((1LL<<32)-1);uint64_t b1 = b >> 32;// The following 3 lines of code is to calculate the carry of d1// (d1 - 32-bit second digit of result, and it can be calculated as d1=d11+d12),// but to avoid overflow.// Actually rewriting the following 2 lines:// uint64_t d1 = (a0 * b0 >> 32) + a1 * b0 + a0 * b1;// uint64_t c1 = d1 >> 32;uint64_t d11 = a1 * b0 + (a0 * b0 >> 32); uint64_t d12 = a0 * b1;uint64_t c1 = (d11 > 18446744073709551615 - d12) ? 1 : 0;uint64_t d2 = a1 * b1 + c1;uint64_t carry = d2; // needed carry stored here


Although there have been several other answers to this question, I several of them have code that is completely untested, and thus far no one has adequately compared the different possible options.

For that reason, I wrote and tested several possible implementations (the last one is based on this code from OpenBSD, discussed on Reddit here). Here's the code:

/* Multiply with overflow checking, emulating clang's builtin function * *     __builtin_umull_overflow * * This code benchmarks five possible schemes for doing so. */#include <stddef.h>#include <stdio.h>#include <stdlib.h>#include <stdint.h>#include <limits.h>#ifndef BOOL    #define BOOL int#endif// Option 1, check for overflow a wider type//    - Often fastest and the least code, especially on modern compilers//    - When long is a 64-bit int, requires compiler support for 128-bits//      ints (requires GCC >= 3.0 or Clang)#if LONG_BIT > 32    typedef __uint128_t long_overflow_t ;#else    typedef uint64_t long_overflow_t;#endifBOOL umull_overflow1(unsigned long lhs, unsigned long rhs, unsigned long* result){        long_overflow_t prod = (long_overflow_t)lhs * (long_overflow_t)rhs;        *result = (unsigned long) prod;        return (prod >> LONG_BIT) != 0;}// Option 2, perform long multiplication using a smaller type//    - Sometimes the fastest (e.g., when mulitply on longs is a library//      call).//    - Performs at most three multiplies, and sometimes only performs one.//    - Highly portable code; works no matter how many bits unsigned long isBOOL umull_overflow2(unsigned long lhs, unsigned long rhs, unsigned long* result){        const unsigned long HALFSIZE_MAX = (1ul << LONG_BIT/2) - 1ul;        unsigned long lhs_high = lhs >> LONG_BIT/2;        unsigned long lhs_low  = lhs & HALFSIZE_MAX;        unsigned long rhs_high = rhs >> LONG_BIT/2;        unsigned long rhs_low  = rhs & HALFSIZE_MAX;        unsigned long bot_bits = lhs_low * rhs_low;        if (!(lhs_high || rhs_high)) {            *result = bot_bits;            return 0;         }        BOOL overflowed = lhs_high && rhs_high;        unsigned long mid_bits1 = lhs_low * rhs_high;        unsigned long mid_bits2 = lhs_high * rhs_low;        *result = bot_bits + ((mid_bits1+mid_bits2) << LONG_BIT/2);        return overflowed || *result < bot_bits            || (mid_bits1 >> LONG_BIT/2) != 0            || (mid_bits2 >> LONG_BIT/2) != 0;}// Option 3, perform long multiplication using a smaller type (this code is// very similar to option 2, but calculates overflow using a different but// equivalent method).//    - Sometimes the fastest (e.g., when mulitply on longs is a library//      call; clang likes this code).//    - Performs at most three multiplies, and sometimes only performs one.//    - Highly portable code; works no matter how many bits unsigned long isBOOL umull_overflow3(unsigned long lhs, unsigned long rhs, unsigned long* result){        const unsigned long HALFSIZE_MAX = (1ul << LONG_BIT/2) - 1ul;        unsigned long lhs_high = lhs >> LONG_BIT/2;        unsigned long lhs_low  = lhs & HALFSIZE_MAX;        unsigned long rhs_high = rhs >> LONG_BIT/2;        unsigned long rhs_low  = rhs & HALFSIZE_MAX;        unsigned long lowbits = lhs_low * rhs_low;        if (!(lhs_high || rhs_high)) {            *result = lowbits;            return 0;         }        BOOL overflowed = lhs_high && rhs_high;        unsigned long midbits1 = lhs_low * rhs_high;        unsigned long midbits2 = lhs_high * rhs_low;        unsigned long midbits  = midbits1 + midbits2;        overflowed = overflowed || midbits < midbits1 || midbits > HALFSIZE_MAX;        unsigned long product = lowbits + (midbits << LONG_BIT/2);        overflowed = overflowed || product < lowbits;        *result = product;        return overflowed;}// Option 4, checks for overflow using division//    - Checks for overflow using division//    - Division is slow, especially if it is a library callBOOLumull_overflow4(unsigned long lhs, unsigned long rhs, unsigned long* result){        *result = lhs * rhs;        return rhs > 0 && (SIZE_MAX / rhs) < lhs;}// Option 5, checks for overflow using division//    - Checks for overflow using division//    - Avoids division when the numbers are "small enough" to trivially//      rule out overflow//    - Division is slow, especially if it is a library callBOOLumull_overflow5(unsigned long lhs, unsigned long rhs, unsigned long* result){        const unsigned long MUL_NO_OVERFLOW = (1ul << LONG_BIT/2) - 1ul;        *result = lhs * rhs;        return (lhs >= MUL_NO_OVERFLOW || rhs >= MUL_NO_OVERFLOW) &&            rhs > 0 && SIZE_MAX / rhs < lhs;}#ifndef umull_overflow    #define umull_overflow2#endif/* * This benchmark code performs a multiply at all bit sizes,  * essentially assuming that sizes are logarithmically distributed. */int main(){        unsigned long i, j, k;        int count = 0;        unsigned long mult;        unsigned long total = 0;        for (k = 0; k < 0x40000000 / LONG_BIT / LONG_BIT; ++k)                for (i = 0; i != LONG_MAX; i = i*2+1)                        for (j = 0; j != LONG_MAX; j = j*2+1) {                                count += umull_overflow(i+k, j+k, &mult);                                total += mult;                        }        printf("%d overflows (total %lu)\n", count, total);}

Here are the results, testing with various compilers and systems I have (in this case, all testing was done on OS X, but results should be similar on BSD or Linux systems):

+------------------+----------+----------+----------+----------+----------+|                  | Option 1 | Option 2 | Option 3 | Option 4 | Option 5 ||                  |  BigInt  | LngMult1 | LngMult2 |   Div    |  OptDiv  |+------------------+----------+----------+----------+----------+----------+| Clang 3.5 i386   |    1.610 |    3.217 |    3.129 |    4.405 |    4.398 || GCC 4.9.0 i386   |    1.488 |    3.469 |    5.853 |    4.704 |    4.712 || GCC 4.2.1 i386   |    2.842 |    4.022 |    3.629 |    4.160 |    4.696 || GCC 4.2.1 PPC32  |    8.227 |    7.756 |    7.242 |   20.632 |   20.481 || GCC 3.3   PPC32  |    5.684 |    9.804 |   11.525 |   21.734 |   22.517 |+------------------+----------+----------+----------+----------+----------+| Clang 3.5 x86_64 |    1.584 |    2.472 |    2.449 |    9.246 |    7.280 || GCC 4.9 x86_64   |    1.414 |    2.623 |    4.327 |    9.047 |    7.538 || GCC 4.2.1 x86_64 |    2.143 |    2.618 |    2.750 |    9.510 |    7.389 || GCC 4.2.1 PPC64  |   13.178 |    8.994 |    8.567 |   37.504 |   29.851 |+------------------+----------+----------+----------+----------+----------+

Based on these results, we can draw a few conclusions:

  • Clearly, the division-based approach, although simple and portable, is slow.
  • No technique is a clear winner in all cases.
  • On modern compilers, the use-a-larger-int approach is best, if you can use it
  • On older compilers, the long-multiplication approach is best
  • Surprisingly, GCC 4.9.0 has performance regressions over GCC 4.2.1, and GCC 4.2.1 has performance regressions over GCC 3.3