Is there a more efficient way to get the length of a 32bit integer in bytes? Is there a more efficient way to get the length of a 32bit integer in bytes? c c

Is there a more efficient way to get the length of a 32bit integer in bytes?


How about this one?

inline int len(uint32 val){    return 4        - ((val & 0xff000000) == 0)        - ((val & 0xffff0000) == 0)        - ((val & 0xffffff00) == 0)    ;}

Removing the inline keyword, g++ -O2 compiles this to the following branchless code:

movl    8(%ebp), %edxmovl    %edx, %eaxandl    $-16777216, %eaxcmpl    $1, %eaxsbbl    %eax, %eaxaddl    $4, %eaxxorl    %ecx, %ecxtestl   $-65536, %edxsete    %clsubl    %ecx, %eaxandl    $-256, %edxsete    %dlmovzbl  %dl, %edxsubl    %edx, %eax

If you don't mind machine-specific solutions, you can use the bsr instruction which searches for the first 1 bit. Then you simply divide by 8 to convert bits to bytes and add 1 to shift the range 0..3 to 1..4:

int len(uint32 val){    asm("mov 8(%ebp), %eax");    asm("or  $255, %eax");    asm("bsr %eax, %eax");    asm("shr $3, %eax");    asm("inc %eax");    asm("mov %eax, 8(%ebp)");    return val;}

Note that I am not an inline assembly god, so maybe there's a better to solution to access val instead of addressing the stack explicitly. But you should get the basic idea.

The GNU compiler also has an interesting built-in function called __builtin_clz:

inline int len(uint32 val){    return ((__builtin_clz(val | 255) ^ 31) >> 3) + 1;}

This looks much better than the inline assembly version to me :)


I did a mini unscientific benchmark just measuring the difference in GetTickCount() calls when calling the function in a loop from 0 to MAX_LONG times under the VS 2010 compiler.

Here's what I saw:

This took 11497 ticks

inline int len(uint32 val){    if(val <= 0x000000ff) return 1;    if(val <= 0x0000ffff) return 2;    if(val <= 0x00ffffff) return 3;    return 4;} 

While this took 14399 ticks

inline int len(uint32 val){    return 4        - ((val & 0xff000000) == 0)        - ((val & 0xffff0000) == 0)        - ((val & 0xffffff00) == 0)    ;}

edit: my idea about why one was faster is wrong because:

inline int len(uint32 val){    return 1        + (val > 0x000000ff)        + (val > 0x0000ffff)        + (val > 0x00ffffff)        ;}

This version used only 11107 ticks. Since + is faster than - perhaps? I'm not sure.

Even faster though was the binary search at 7161 ticks

inline int len(uint32 val){    if (val & 0xffff0000) return (val & 0xff000000)? 4: 3;    return (val & 0x0000ff00)? 2: 1;}

And fastest so far is using the MS intrinsic function, at 4399 ticks

#pragma intrinsic(_BitScanReverse)inline int len2(uint32 val){    DWORD index;    _BitScanReverse(&index, val);    return (index>>3)+1;}

For reference - here's the code i used to profile:

int _tmain(int argc, _TCHAR* argv[]){    int j = 0;    DWORD t1,t2;    t1 = GetTickCount();    for(ULONG i=0; i<-1; i++)        j=len(i);    t2 = GetTickCount();    _tprintf(_T("%ld ticks %ld\n"), t2-t1, j);    t1 = GetTickCount();    for(ULONG i=0; i<-1; i++)        j=len2(i);    t2 = GetTickCount();    _tprintf(_T("%ld ticks %ld\n"), t2-t1, j);}

Had to print j to prevent the loops from being optimized out.


Do you really have profile evidence that this is a significant bottleneck in your application? Just do it the most obvious way and only if profiling shows it to be a problem (which I doubt), then try to improve things. Most likely you'll get the best improvement by reducing the number of calls to this function than by changing something within it.