What is the advantage of GCC's __builtin_expect in if else statements? What is the advantage of GCC's __builtin_expect in if else statements? c c

What is the advantage of GCC's __builtin_expect in if else statements?


Imagine the assembly code that would be generated from:

if (__builtin_expect(x, 0)) {    foo();    ...} else {    bar();    ...}

I guess it should be something like:

  cmp   $x, 0  jne   _foo_bar:  call  bar  ...  jmp   after_if_foo:  call  foo  ...after_if:

You can see that the instructions are arranged in such an order that the bar case precedes the foo case (as opposed to the C code). This can utilise the CPU pipeline better, since a jump thrashes the already fetched instructions.

Before the jump is executed, the instructions below it (the bar case) are pushed to the pipeline. Since the foo case is unlikely, jumping too is unlikely, hence thrashing the pipeline is unlikely.


Let's decompile to see what GCC 4.8 does with it

Blagovest mentioned branch inversion to improve the pipeline, but do current compilers really do it? Let's find out!

Without __builtin_expect

#include "stdio.h"#include "time.h"int main() {    /* Use time to prevent it from being optimized away. */    int i = !time(NULL);    if (i)        puts("a");    return 0;}

Compile and decompile with GCC 4.8.2 x86_64 Linux:

gcc -c -O3 -std=gnu11 main.cobjdump -dr main.o

Output:

0000000000000000 <main>:   0:       48 83 ec 08             sub    $0x8,%rsp   4:       31 ff                   xor    %edi,%edi   6:       e8 00 00 00 00          callq  b <main+0xb>                    7: R_X86_64_PC32        time-0x4   b:       48 85 c0                test   %rax,%rax   e:       75 0a                   jne    1a <main+0x1a>  10:       bf 00 00 00 00          mov    $0x0,%edi                    11: R_X86_64_32 .rodata.str1.1  15:       e8 00 00 00 00          callq  1a <main+0x1a>                    16: R_X86_64_PC32       puts-0x4  1a:       31 c0                   xor    %eax,%eax  1c:       48 83 c4 08             add    $0x8,%rsp  20:       c3                      retq

The instruction order in memory was unchanged: first the puts and then retq return.

With __builtin_expect

Now replace if (i) with:

if (__builtin_expect(i, 0))

and we get:

0000000000000000 <main>:   0:       48 83 ec 08             sub    $0x8,%rsp   4:       31 ff                   xor    %edi,%edi   6:       e8 00 00 00 00          callq  b <main+0xb>                    7: R_X86_64_PC32        time-0x4   b:       48 85 c0                test   %rax,%rax   e:       74 07                   je     17 <main+0x17>  10:       31 c0                   xor    %eax,%eax  12:       48 83 c4 08             add    $0x8,%rsp  16:       c3                      retq  17:       bf 00 00 00 00          mov    $0x0,%edi                    18: R_X86_64_32 .rodata.str1.1  1c:       e8 00 00 00 00          callq  21 <main+0x21>                    1d: R_X86_64_PC32       puts-0x4  21:       eb ed                   jmp    10 <main+0x10>

The puts was moved to the very end of the function, the retq return!

The new code is basically the same as:

int i = !time(NULL);if (i)    goto puts;ret:return 0;puts:puts("a");goto ret;

This optimization was not done with -O0.

But good luck on writing an example that runs faster with __builtin_expect than without, CPUs are really smart those days. My naive attempts are here.

C++20 [[likely]] and [[unlikely]]

C++20 has standardized those C++ built-ins: How to use C++20's likely/unlikely attribute in if-else statement They will likely (a pun!) do the same thing.


The idea of __builtin_expect is to tell the compiler that you'll usually find that the expression evaluates to c, so that the compiler can optimize for that case.

I'd guess that someone thought they were being clever and that they were speeding things up by doing this.

Unfortunately, unless the situation is very well understood (it's likely that they have done no such thing), it may well have made things worse. The documentation even says:

In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.

In general, you shouldn't be using __builtin_expect unless:

  • You have a very real performance issue
  • You've already optimized the algorithms in the system appropriately
  • You've got performance data to back up your assertion that a particular case is the most likely