WRITE_ONCE in linux kernel lists WRITE_ONCE in linux kernel lists c c

WRITE_ONCE in linux kernel lists


The first definition you refer to is part of the kernel lock validator, aka "lockdep". WRITE_ONCE (and others) don't need special treatment, but the reason why is the subject of another question.

The relevant definition would be here, and a very terse comment states their purpose to be:

Prevent the compiler from merging or refetching reads or writes.

...

Ensuring that the compiler does not fold, spindle, or otherwise mutilate accesses that either do not require ordering or that interact with an explicit memory barrier or atomic instruction that provides the required ordering.

But what do those words mean?


The problem

The problem is actually plural:

  1. Read/write "tearing" : replacing a single memory access with many smaller ones. GCC may (and does!) in certain situations replace something like p = 0x01020304; with two 16-bit store-immediate instructions -instead of presumably placing the constant in a register and then a memory access, and so forth. WRITE_ONCE would allow us to say to GCC, "don't do that", like so: WRITE_ONCE(p, 0x01020304);

  2. C compilers have stopped guaranteeing that a word access is atomic. Any program which is non-race-free can be miscompiled with spectacular results. Not only that, but a compiler may decide to not keep certain values in registers inside a loop, leading to multiple references that can mess up code like this:

    for(;;) {        owner = lock->owner;        if (owner && !mutex_spin_on_owner(lock, owner))            break;        /* ... */    }
  1. In absence of "tagging" accesses to shared memory, we cannot automatically detect unintended accesses of that sort. Automated tools that try to find such bugs cannot distinguish them from intentionally racy accesses.

The solution

We begin by noting that the Linux kernel demands to be built with GCC. Thus, there's only one compiler we need to take care of with the solution, and we can use its documentation as the only guide.

For a generic solution, we need to handle memory accesses of all sizes. We have all the various types of specific widths, and everything else. We also note that we don't need to specifically tag memory accesses which are already in critical sections (why not?).

For sizes of 1, 2, 4, and 8 bytes, there are appropriate types, and volatile specifically disallows GCC from applying the optimisation we referred to in (1), as well as taking care of other cases (last bullet point under "COMPILER BARRIERS"). It also disallows GCC of miscompiling the loop in (2), because it would move the volatile access across a sequence point, and that's disallowed by the C standard. Linux uses what we call a "volatile access" (see below) instead of tagging an object as volatile. We could solve our problem by marking the specific object as volatile, but this is (almost?) never a good choice. There are many reasons it could be harmful.

This is how a volatile (write) access is implemented in the kernel for an 8-bit wide type:

*(volatile  __u8_alias_t *) p = *(__u8_alias_t  *) res;

Suppose we didn't know exactly what volatile does - and finding out isn't easy! (check out #5) - another way to accomplish this would be to place memory barriers: This is exactly what Linux does in case the size is anything other than 1,2,4, or 8, resorting to memcpy and placing memory barriers before and after the call. Memory barriers easily solve problem (2) as well, but incur large performance penalties.

I hope I've covered an overview without delving into interpretations of the C standard, but if you'd like I could take the time to do that.