Catch and compute overflow during multiplication of two large integers
1. Detecting the overflow:
x = a * b;if (a != 0 && x / a != b) { // overflow handling}
Edit: Fixed division by 0
(thanks Mark!)
2. Computing the carry is quite involved. One approach is to split both operands into half-words, then apply long multiplication to the half-words:
uint64_t hi(uint64_t x) { return x >> 32;}uint64_t lo(uint64_t x) { return ((1ULL << 32) - 1) & x;}void multiply(uint64_t a, uint64_t b) { // actually uint32_t would do, but the casting is annoying uint64_t s0, s1, s2, s3; uint64_t x = lo(a) * lo(b); s0 = lo(x); x = hi(a) * lo(b) + hi(x); s1 = lo(x); s2 = hi(x); x = s1 + lo(a) * hi(b); s1 = lo(x); x = s2 + hi(a) * hi(b) + hi(x); s2 = lo(x); s3 = hi(x); uint64_t result = s1 << 32 | s0; uint64_t carry = s3 << 32 | s2;}
To see that none of the partial sums themselves can overflow, we consider the worst case:
x = s2 + hi(a) * hi(b) + hi(x)
Let B = 1 << 32
. We then have
x <= (B - 1) + (B - 1)(B - 1) + (B - 1) <= B*B - 1 < B*B
I believe this will work - at least it handles Sjlver's test case. Aside from that, it is untested (and might not even compile, as I don't have a C++ compiler at hand anymore).
The idea is to use following fact which is true for integral operation:
a*b > c
if and only if a > c/b
/
is integral division here.
The pseudocode to check against overflow for positive numbers follows:
if (a > max_int64 / b) then "overflow" else "ok".
To handle zeroes and negative numbers you should add more checks.
C code for non-negative a
and b
follows:
if (b > 0 && a > 18446744073709551615 / b) { // overflow handling}; else { c = a * b;}
Note:
18446744073709551615 == (1<<64)-1
To calculate carry we can use approach to split number into two 32-digits and multiply them as we do this on the paper. We need to split numbers to avoid overflow.
Code follows:
// split input numbers into 32-bit digitsuint64_t a0 = a & ((1LL<<32)-1);uint64_t a1 = a >> 32;uint64_t b0 = b & ((1LL<<32)-1);uint64_t b1 = b >> 32;// The following 3 lines of code is to calculate the carry of d1// (d1 - 32-bit second digit of result, and it can be calculated as d1=d11+d12),// but to avoid overflow.// Actually rewriting the following 2 lines:// uint64_t d1 = (a0 * b0 >> 32) + a1 * b0 + a0 * b1;// uint64_t c1 = d1 >> 32;uint64_t d11 = a1 * b0 + (a0 * b0 >> 32); uint64_t d12 = a0 * b1;uint64_t c1 = (d11 > 18446744073709551615 - d12) ? 1 : 0;uint64_t d2 = a1 * b1 + c1;uint64_t carry = d2; // needed carry stored here
Although there have been several other answers to this question, I several of them have code that is completely untested, and thus far no one has adequately compared the different possible options.
For that reason, I wrote and tested several possible implementations (the last one is based on this code from OpenBSD, discussed on Reddit here). Here's the code:
/* Multiply with overflow checking, emulating clang's builtin function * * __builtin_umull_overflow * * This code benchmarks five possible schemes for doing so. */#include <stddef.h>#include <stdio.h>#include <stdlib.h>#include <stdint.h>#include <limits.h>#ifndef BOOL #define BOOL int#endif// Option 1, check for overflow a wider type// - Often fastest and the least code, especially on modern compilers// - When long is a 64-bit int, requires compiler support for 128-bits// ints (requires GCC >= 3.0 or Clang)#if LONG_BIT > 32 typedef __uint128_t long_overflow_t ;#else typedef uint64_t long_overflow_t;#endifBOOL umull_overflow1(unsigned long lhs, unsigned long rhs, unsigned long* result){ long_overflow_t prod = (long_overflow_t)lhs * (long_overflow_t)rhs; *result = (unsigned long) prod; return (prod >> LONG_BIT) != 0;}// Option 2, perform long multiplication using a smaller type// - Sometimes the fastest (e.g., when mulitply on longs is a library// call).// - Performs at most three multiplies, and sometimes only performs one.// - Highly portable code; works no matter how many bits unsigned long isBOOL umull_overflow2(unsigned long lhs, unsigned long rhs, unsigned long* result){ const unsigned long HALFSIZE_MAX = (1ul << LONG_BIT/2) - 1ul; unsigned long lhs_high = lhs >> LONG_BIT/2; unsigned long lhs_low = lhs & HALFSIZE_MAX; unsigned long rhs_high = rhs >> LONG_BIT/2; unsigned long rhs_low = rhs & HALFSIZE_MAX; unsigned long bot_bits = lhs_low * rhs_low; if (!(lhs_high || rhs_high)) { *result = bot_bits; return 0; } BOOL overflowed = lhs_high && rhs_high; unsigned long mid_bits1 = lhs_low * rhs_high; unsigned long mid_bits2 = lhs_high * rhs_low; *result = bot_bits + ((mid_bits1+mid_bits2) << LONG_BIT/2); return overflowed || *result < bot_bits || (mid_bits1 >> LONG_BIT/2) != 0 || (mid_bits2 >> LONG_BIT/2) != 0;}// Option 3, perform long multiplication using a smaller type (this code is// very similar to option 2, but calculates overflow using a different but// equivalent method).// - Sometimes the fastest (e.g., when mulitply on longs is a library// call; clang likes this code).// - Performs at most three multiplies, and sometimes only performs one.// - Highly portable code; works no matter how many bits unsigned long isBOOL umull_overflow3(unsigned long lhs, unsigned long rhs, unsigned long* result){ const unsigned long HALFSIZE_MAX = (1ul << LONG_BIT/2) - 1ul; unsigned long lhs_high = lhs >> LONG_BIT/2; unsigned long lhs_low = lhs & HALFSIZE_MAX; unsigned long rhs_high = rhs >> LONG_BIT/2; unsigned long rhs_low = rhs & HALFSIZE_MAX; unsigned long lowbits = lhs_low * rhs_low; if (!(lhs_high || rhs_high)) { *result = lowbits; return 0; } BOOL overflowed = lhs_high && rhs_high; unsigned long midbits1 = lhs_low * rhs_high; unsigned long midbits2 = lhs_high * rhs_low; unsigned long midbits = midbits1 + midbits2; overflowed = overflowed || midbits < midbits1 || midbits > HALFSIZE_MAX; unsigned long product = lowbits + (midbits << LONG_BIT/2); overflowed = overflowed || product < lowbits; *result = product; return overflowed;}// Option 4, checks for overflow using division// - Checks for overflow using division// - Division is slow, especially if it is a library callBOOLumull_overflow4(unsigned long lhs, unsigned long rhs, unsigned long* result){ *result = lhs * rhs; return rhs > 0 && (SIZE_MAX / rhs) < lhs;}// Option 5, checks for overflow using division// - Checks for overflow using division// - Avoids division when the numbers are "small enough" to trivially// rule out overflow// - Division is slow, especially if it is a library callBOOLumull_overflow5(unsigned long lhs, unsigned long rhs, unsigned long* result){ const unsigned long MUL_NO_OVERFLOW = (1ul << LONG_BIT/2) - 1ul; *result = lhs * rhs; return (lhs >= MUL_NO_OVERFLOW || rhs >= MUL_NO_OVERFLOW) && rhs > 0 && SIZE_MAX / rhs < lhs;}#ifndef umull_overflow #define umull_overflow2#endif/* * This benchmark code performs a multiply at all bit sizes, * essentially assuming that sizes are logarithmically distributed. */int main(){ unsigned long i, j, k; int count = 0; unsigned long mult; unsigned long total = 0; for (k = 0; k < 0x40000000 / LONG_BIT / LONG_BIT; ++k) for (i = 0; i != LONG_MAX; i = i*2+1) for (j = 0; j != LONG_MAX; j = j*2+1) { count += umull_overflow(i+k, j+k, &mult); total += mult; } printf("%d overflows (total %lu)\n", count, total);}
Here are the results, testing with various compilers and systems I have (in this case, all testing was done on OS X, but results should be similar on BSD or Linux systems):
+------------------+----------+----------+----------+----------+----------+| | Option 1 | Option 2 | Option 3 | Option 4 | Option 5 || | BigInt | LngMult1 | LngMult2 | Div | OptDiv |+------------------+----------+----------+----------+----------+----------+| Clang 3.5 i386 | 1.610 | 3.217 | 3.129 | 4.405 | 4.398 || GCC 4.9.0 i386 | 1.488 | 3.469 | 5.853 | 4.704 | 4.712 || GCC 4.2.1 i386 | 2.842 | 4.022 | 3.629 | 4.160 | 4.696 || GCC 4.2.1 PPC32 | 8.227 | 7.756 | 7.242 | 20.632 | 20.481 || GCC 3.3 PPC32 | 5.684 | 9.804 | 11.525 | 21.734 | 22.517 |+------------------+----------+----------+----------+----------+----------+| Clang 3.5 x86_64 | 1.584 | 2.472 | 2.449 | 9.246 | 7.280 || GCC 4.9 x86_64 | 1.414 | 2.623 | 4.327 | 9.047 | 7.538 || GCC 4.2.1 x86_64 | 2.143 | 2.618 | 2.750 | 9.510 | 7.389 || GCC 4.2.1 PPC64 | 13.178 | 8.994 | 8.567 | 37.504 | 29.851 |+------------------+----------+----------+----------+----------+----------+
Based on these results, we can draw a few conclusions:
- Clearly, the division-based approach, although simple and portable, is slow.
- No technique is a clear winner in all cases.
- On modern compilers, the use-a-larger-int approach is best, if you can use it
- On older compilers, the long-multiplication approach is best
- Surprisingly, GCC 4.9.0 has performance regressions over GCC 4.2.1, and GCC 4.2.1 has performance regressions over GCC 3.3