What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C? What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C? c c

What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C?


GCC has:

 -- Built-in Function: int __builtin_clz (unsigned int x)     Returns the number of leading 0-bits in X, starting at the most     significant bit position.  If X is 0, the result is undefined. -- Built-in Function: int __builtin_clzl (unsigned long)     Similar to `__builtin_clz', except the argument type is `unsigned     long'. -- Built-in Function: int __builtin_clzll (unsigned long long)     Similar to `__builtin_clz', except the argument type is `unsigned     long long'.

I'd expect them to be translated into something reasonably efficient for your current platform, whether it be one of those fancy bit-twiddling algorithms, or a single instruction.


A useful trick if your input can be zero is __builtin_clz(x | 1): unconditionally setting the low bit without modifying any others makes the output 31 for x=0, without changing the output for any other input.

To avoid needing to do that, your other option is platform-specific intrinsics like ARM GCC's __clz (no header needed), or x86's _lzcnt_u32 on CPUs that support the lzcnt instruction. (Beware that lzcnt decodes as bsr on older CPUs instead of faulting, which gives 31-lzcnt for non-zero inputs.)

There's unfortunately no way to portably take advantage of the various CLZ instructions on non-x86 platforms that do define the result for input=0 as 32 or 64 (according to the operand width). x86's lzcnt does that, too, while bsr produces a bit-index that the compiler has to flip unless you use 31-__builtin_clz(x).

(The "undefined result" is not C Undefined Behavior, just a value that isn't defined. It's actually whatever was in the destination register when the instruction ran. AMD documents this, Intel doesn't, but Intel's CPUs do implement that behaviour. But it's not whatever was previously in the C variable you're assigning to, that's not usually how things work when gcc turns C into asm. See also Why does breaking the "output dependency" of LZCNT matter?)


Assuming you're on x86 and game for a bit of inline assembler, Intel provides a BSR instruction ("bit scan reverse"). It's fast on some x86s (microcoded on others). From the manual:

Searches the source operand for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand. The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the content source operand is 0, the content of the destination operand is undefined.

(If you're on PowerPC there's a similar cntlz ("count leading zeros") instruction.)

Example code for gcc:

#include <iostream>int main (int,char**){  int n=1;  for (;;++n) {    int msb;    asm("bsrl %1,%0" : "=r"(msb) : "r"(n));    std::cout << n << " : " << msb << std::endl;  }  return 0;}

See also this inline assembler tutorial, which shows (section 9.4) it being considerably faster than looping code.


Since 2^N is an integer with only the Nth bit set (1 << N), finding the position (N) of the highest set bit is the integer log base 2 of that integer.

http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious

unsigned int v;unsigned r = 0;while (v >>= 1) {    r++;}

This "obvious" algorithm may not be transparent to everyone, but when you realize that the code shifts right by one bit repeatedly until the leftmost bit has been shifted off (note that C treats any non-zero value as true) and returns the number of shifts, it makes perfect sense. It also means that it works even when more than one bit is set — the result is always for the most significant bit.

If you scroll down on that page, there are faster, more complex variations. However, if you know you're dealing with numbers with a lot of leading zeroes, the naive approach may provide acceptable speed, since bit shifting is rather fast in C, and the simple algorithm doesn't require indexing an array.

NOTE: When using 64-bit values, be extremely cautious about using extra-clever algorithms; many of them only work correctly for 32-bit values.