What is gcc doing here to run this code once per thread?
The fs
segment base is the address of thread-local storage (on x86-64 Linux at least).
.zero 8
reserves 8 bytes of zeros (presumably in the BSS). Check the GAS manual: https://sourceware.org/binutils/docs/as/Zero.html, links in https://stackoverflow.com/tags/x86/info.
@tpoff
presumably means to address it relative to thread-local storage, probably stands for thread something offset, I don't know.
The rest of it looks similar to what gcc normally does for static
local variables that need a runtime initializer: a guard variable that it checks every time it enters the function, falling through in the already-initialized case.
The 1-byte guard variable is in thread-local storage. The actual _
itself is optimized away because it's never read. Notice there's no store of eax
after foo
returns.
BTW, _
is a weird (bad) choice for a variable name. Easy to miss it, and probably reserved for use by the implementation.
It has a nice optimization here: normally (for non-thread-local static int var = foo();
) if it finds the guard variable isn't already initialized, it needs a thread-safe way to make sure only one thread actually does the initialization (essentially taking a lock).
But here each thread has its own guard variable (and should run foo()
the first time regardless of what other threads are doing) so it doesn't need to call a run_once
function to get mutual exclusion.
(sorry for the short answer, I may expand this later with an example on https://godbolt.org/ of a non-thread-local static
local variable. Or find an SO Q&A about it.)