Can atomics suffer spurious stores? Can atomics suffer spurious stores? multithreading multithreading

Can atomics suffer spurious stores?


Your code makes use of fetch_add() on the atomic, which gives the following guarantee:

Atomically replaces the current value with the result of arithmetic addition of the value and arg. The operation is read-modify-write operation. Memory is affected according to the value of order.

The semantics are crystal clear: before the operation it's m, after the operation it's m+2, and no thread accesses to what's between these two states because the operation is atomic.


Edit: additional elements regarding your alternate question

Whatever Boehm and Adve may say, the C++ compilers obey to the following standard clause:

1.9/5: A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possibleexecutions of the corresponding instance of the abstract machine withthe same program and the same input.

If a C++ compiler would generate code that could allow speculative updates to interfere with the observable behavior of the program (aka getting something else than 5 or 7), it would not be standard compliant, because it would fail to ensure the guarantee mentioned in my initial answer.


The existing answers provide a lot of good explanation, but they fail to give a direct answer to your question. Here we go:

can atomics suffer spurious stores?

Yes, but you cannot observe them from a C++ program which is free from data races.

Only volatile is actually prohibited from performing extra memory accesses.

does the C++ memory model forbid thread 1 from behaving as though it did this?

++m;++m;

Yes, but this one is allowed:

lock (shared_std_atomic_secret_lock){    ++m;    ++m;}

It's allowed but stupid. A more realistic possibility is turning this:

std::atomic<int64_t> m;++m;

into

memory_bus_lock{    ++m.low;    if (last_operation_did_carry)       ++m.high;}

where memory_bus_lock and last_operation_did_carry are features of the hardware platform that can't be expressed in portable C++.

Note that peripherals sitting on the memory bus do see the intermediate value, but can interpret this situation correctly by looking at the memory bus lock. Software debuggers won't be able to see the intermediate value.

In other cases, atomic operations can be implemented by software locks, in which case:

  1. Software debuggers can see intermediate values, and have to be aware of the software lock to avoid misinterpretation
  2. Hardware peripherals will see changes to the software lock, and intermediate values of the atomic object. Some magic may be required for the peripheral to recognize the relationship between the two.
  3. If the atomic object is in shared memory, other processes can see the intermediate values and may not have any way to inspect the software lock / may have a separate copy of said software lock
  4. If other threads in the same C++ program break type safety in a way that causes a data race (For example, using memcpy to read the atomic object) they can observe intermediate values. Formally, that's undefined behavior.

One last important point. The "speculative write" is a very complex scenario. It's easier to see this if we rename the condition:

Thread #1

if (my_mutex.is_held) o += 2; // o is an ordinary variable, not atomic or volatilereturn o;

Thread #2

{    scoped_lock l(my_mutex);    return o;}

There's no data race here. If Thread #1 has the mutex locked, the write and read can't occur unordered. If it doesn't have the mutex locked, the threads run unordered but both are performing only reads.

Therefore the compiler cannot allow intermediate values to be seen. This C++ code is not a correct rewrite:

o += 2;if (!my_mutex.is_held) o -= 2;

because the compiler invented a data race. However, if the hardware platform provides a mechanism for race-free speculative writes (Itanium perhaps?), the compiler can use it. So hardware might see intermediate values, even though C++ code cannot.

If intermediate values shouldn't be seen by hardware, you need to use volatile (possibly in addition to atomics, because volatile read-modify-write is not guaranteed atomic). With volatile, asking for an operation which can't be performed as-written will result in compilation failure, not spurious memory access.


Your revised question differs quite a bit from the first in that we've moved from sequential consistency to relaxed memory order.

Both reasoning about and specifying weak memory orderings can be fairly tricky. E.g. note the difference between C++11 and C++14 specification pointed out here: http://en.cppreference.com/w/cpp/atomic/memory_order#Relaxed_ordering . However, the definition of atomicity does prevent the fetch_add call from allowing any other thread to see values other than ones otherwise written to the variable or one of those plus 2. (A thread can do pretty much anything so long as it guarantees the intermediate values are not observable by other threads.)

(To get dreadfully specific, you likely want to search for "read-modify-write" in the C++ spec, e.g. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf .)

Perhaps giving a specific reference to the place in the linked paper that you have questions about would help. That paper predates the first C++ concurrent memory model specification (in C++11) by a tiny bit and we're now another rev beyond that so it may also be a bit out of date with respect to what the standard actually says, though I expect this is more an issue of it proposing things that could happen on non-atomic variables.

EDIT: I'll add a bit more about "the semantics" to perhaps help think about how to analyze this kind of thing.

The goal of memory ordering is to establish a set of possible orders between reads and writes to variables across threads. In weaker orderings, it is not guaranteed that there is any single global ordering that applies to all threads. This alone is already tricky enough that one should make sure it is fully understood before moving on.

Two things involved in specifying an ordering are addresses and synchronization operations. In effect a synchronization operation has two sides and those two sides are connected via sharing an address. (A fence can be thought of as applying to all addresses.) A lot of the confusion in the space comes from figuring out when a synchronization operation on one address guarantees something for other addresses. E.g. mutex lock and unlock operations only establish ordering via the acquire and release operations on the addresses inside the mutex, but that synchronization applies to all reads and writes by the threads locking and unlocking the mutex. An atomic variable accessed using relaxed ordering places few constraints on what happens, but those accesses may have ordering constraints imposed by more strongly ordered operations on other atomic variables or mutexes.

The main synchronization operations are acquire and release. See: http://en.cppreference.com/w/cpp/atomic/memory_order . These are names per what happens with a mutex. The acquire operation applies to loads and prevents any memory operations on the current thread from being reordered past the point where the acquire happens. It also establishes an ordering with any prior release operations on the same variable. The last bit is governed by the value loaded. I.e. if the load returns a value from a given write with release synchronization, the load is now ordered against that write and all other memory operations by those threads fall into place according to the ordering rules.

Atomic, or read-modify-write, operations are their own little sequence in the larger ordering. It is guaranteed that the read, the operation, and the write happen atomically. Any other ordering is given by the memory order parameter to the operation. E.g. specifying relaxed ordering says no constraints otherwise apply to any other variables. I.e. there is no acquire or release implied by the operation. Specifying memory_order_acq_rel says that not only is the operation atomic, but that the read is an acquire and the write is a release -- if the thread reads a value from another write with release semantics, all other atomics now have the appropriate ordering constraint in this thread.

A fetch_add with relaxed memory order might be used for an statistics counter in profiling. At the end of the operation, all threads will have done something else to assure all those counter increments are now visible to the final reader, but in the intermediate state we don't care so long as the final total adds up. However this does not imply that intermediate reads can sample values that were never part of the count. E.g. if we're always adding even values to a counter starting at 0, no thread should ever read an odd value regardless of ordering.

I am a bit put off by not being able to point to a specific piece of text in the standard which says there can be no side effects to atomic variables other than those explicitly encoded in the program somehow. Lots of things mention side effects, but it seems to be taken for granted that the side effects are those specified by the source and not anything made up by the compiler. Don't have time to track this down right now, but there is a lot of stuff that would not work if this were not guaranteed and part of the point of std::atomic is to get this constraint as it is not guaranteed by other variables. (It is somewhat provided by volatile, or at least is intended to be. Part of the reason we have this degree of specification for memory ordering around std::atomic is because volatile never became well enough specified to reason about in detail and no one set of constraints met all needs.)