What is gcc doing here to run this code once per thread? What is gcc doing here to run this code once per thread? multithreading multithreading

What is gcc doing here to run this code once per thread?


The fs segment base is the address of thread-local storage (on x86-64 Linux at least).

.zero 8 reserves 8 bytes of zeros (presumably in the BSS). Check the GAS manual: https://sourceware.org/binutils/docs/as/Zero.html, links in https://stackoverflow.com/tags/x86/info.

@tpoff presumably means to address it relative to thread-local storage, probably stands for thread something offset, I don't know.


The rest of it looks similar to what gcc normally does for static local variables that need a runtime initializer: a guard variable that it checks every time it enters the function, falling through in the already-initialized case.

The 1-byte guard variable is in thread-local storage. The actual _ itself is optimized away because it's never read. Notice there's no store of eax after foo returns.

BTW, _ is a weird (bad) choice for a variable name. Easy to miss it, and probably reserved for use by the implementation.


It has a nice optimization here: normally (for non-thread-local static int var = foo();) if it finds the guard variable isn't already initialized, it needs a thread-safe way to make sure only one thread actually does the initialization (essentially taking a lock).

But here each thread has its own guard variable (and should run foo() the first time regardless of what other threads are doing) so it doesn't need to call a run_once function to get mutual exclusion.

(sorry for the short answer, I may expand this later with an example on https://godbolt.org/ of a non-thread-local static local variable. Or find an SO Q&A about it.)