Position of least significant bit that is set Position of least significant bit that is set c c

Position of least significant bit that is set


Bit Twiddling Hacks offers an excellent collection of, er, bit twiddling hacks, with performance/optimisation discussion attached. My favourite solution for your problem (from that site) is «multiply and lookup»:

unsigned int v;  // find the number of trailing zeros in 32-bit v int r;           // result goes herestatic const int MultiplyDeBruijnBitPosition[32] = {  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,   31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9};r = MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x077CB531U)) >> 27];

Helpful references:


Why not use the built-in ffs? (I grabbed a man page from Linux, but it's more widely available than that.)

ffs(3) - Linux man page

Name

ffs - find first bit set in a word

Synopsis

#include <strings.h>int ffs(int i);#define _GNU_SOURCE#include <string.h>int ffsl(long int i);int ffsll(long long int i);

Description

The ffs() function returns the position of the first (least significant) bit set in the word i. The least significant bit is position 1 and the most significant position e.g. 32 or 64. The functions ffsll() and ffsl() do the same but take arguments of possibly different size.

Return Value

These functions return the position of the first bit set, or 0 if no bits are set in i.

Conforming to

4.3BSD, POSIX.1-2001.

Notes

BSD systems have a prototype in <string.h>.


There is an x86 assembly instruction (bsf) that will do it. :)

More optimized?!

Side Note:

Optimization at this level is inherently architecture dependent. Today's processors are too complex (in terms of branch prediction, cache misses, pipelining) that it's so hard to predict which code is executed faster on which architecture. Decreasing operations from 32 to 9 or things like that might even decrease the performance on some architectures. Optimized code on a single architecture might result in worse code in the other. I think you'd either optimize this for a specific CPU or leave it as it is and let the compiler to choose what it thinks it's better.