Concurrency: Atomic and volatile in C++11 memory model Concurrency: Atomic and volatile in C++11 memory model multithreading multithreading

Concurrency: Atomic and volatile in C++11 memory model


Firstly, volatile does not imply atomic access. It is designed for things like memory mapped I/O and signal handling. volatile is completely unnecessary when used with std::atomic, and unless your platform documents otherwise, volatile has no bearing on atomic access or memory ordering between threads.

If you have a global variable which is shared between threads, such as:

std::atomic<int> ai;

then the visibility and ordering constraints depend on the memory ordering parameter you use for operations, and the synchronization effects of locks, threads and accesses to other atomic variables.

In the absence of any additional synchronization, if one thread writes a value to ai then there is nothing that guarantees that another thread will see the value in any given time period. The standard specifies that it should be visible "in a reasonable period of time", but any given access may return a stale value.

The default memory ordering of std::memory_order_seq_cst provides a single global total order for all std::memory_order_seq_cst operations across all variables. This doesn't mean that you can't get stale values, but it does mean that the value you do get determines and is determined by where in this total order your operation lies.

If you have 2 shared variables x and y, initially zero, and have one thread write 1 to x and another write 2 to y, then a third thread that reads both may see either (0,0), (1,0), (0,2) or (1,2) since there is no ordering constraint between the operations, and thus the operations may appear in any order in the global order.

If both writes are from the same thread, which does x=1 before y=2 and the reading thread reads y before x then (0,2) is no longer a valid option, since the read of y==2 implies that the earlier write to x is visible. The other 3 pairings (0,0), (1,0) and (1,2) are still possible, depending how the 2 reads interleave with the 2 writes.

If you use other memory orderings such as std::memory_order_relaxed or std::memory_order_acquire then the constraints are relaxed even further, and the single global ordering no longer applies. Threads don't even necessarily have to agree on the ordering of two stores to separate variables if there is no additional synchronization.

The only way to guarantee you have the "latest" value is to use a read-modify-write operation such as exchange(), compare_exchange_strong() or fetch_add(). Read-modify-write operations have an additional constraint that they always operate on the "latest" value, so a sequence of ai.fetch_add(1) operations by a series of threads will return a sequence of values with no duplicates or gaps. In the absence of additional constraints, there's still no guarantee which threads will see which values though. In particular, it is important to note that the use of an RMW operation does not force changes from other threads to become visible any quicker, it just means that if the changes are not seen by the RMW then all threads must agree that they are later in the modification order of that atomic variable than the RMW operation. Stores from different threads can still be delayed by arbitrary amounts of time, depending on when the CPU actually issues the store to memory (rather than just its own store buffer), physically how far apart the CPUs executing the threads are (in the case of a multi-processor system), and the details of the cache coherency protocol.

Working with atomic operations is a complex topic. I suggest you read a lot of background material, and examine published code before writing production code with atomics. In most cases it is easier to write code that uses locks, and not noticeably less efficient.


volatile and the atomic operations have a different background, andwere introduced with a different intent.

volatile dates from way back, and is principally designed to preventcompiler optimizations when accessing memory mapped IO. Moderncompilers tend to do no more than suppress optimizations for volatile,although on some machines, this isn't sufficient for even memory mappedIO. Except for the special case of signal handlers, and setjmp,longjmp and getjmp sequences (where the C standard, and in the caseof signals, the Posix standard, gives additional guarantees), it must beconsidered useless on a modern machine, where without special additional instructions (fences or memory barriers), the hardware may reorder oreven suppress certain accesses. Since you shouldn't be using setjmpet al. in C++, this more or less leaves signal handlers, and in amultithreaded environment, at least under Unix, there are bettersolutions for those as well. And possibly memory mapped IO, if you'reworking on kernal code and can ensure that the compiler generateswhatever is needed for the platform in question. (According to thestandard, volatile access is observable behavior, which the compilermust respect. But the compiler gets to define what is meant by“access”, and most seem to define it as “a load orstore machine instruction was executed”. Which, on a modernprocessor, doesn't even mean that there is necessarily a read or writecycle on the bus, much less that it's in the order you expect.)

Given this situation, the C++ standard added atomic access, which doesprovide a certain number of guarantees across threads; in particular,the code generated around an atomic access will contain the necessaryadditional instructions to prevent the hardware from reordering theaccesses, and to ensure that the accesses propagate down to the globalmemory shared between cores on a multicore machine. (At one point inthe standardization effort, Microsoft proposed adding these semantics tovolatile, and I think some of their C++ compilers do. Afterdiscussion of the issues in the committee, however, the generalconsensus—including the Microsoft representative—was that itwas better to leave volatile with its orginal meaning, and to definethe atomic types.) Or just use the system level primitives, likemutexes, which execute whatever instructions are needed in their code.(They have to. You can't implement a mutex without some guaranteesconcerning the order of memory accesses.)


Here's a basic synopsis of what the 2 things are:

1) Volatile keyword:
Tells the compiler that this value could alter at any moment and therefore it should not EVER cache it in a register. Look up the old "register" keyword in C. "Volatile" is basically the "-" operator to "register"'s "+". Modern compilers now do the optimization that "register" used to explicitly request by default, so you only see 'volatile' anymore. Using the volatile qualifier will guarantee that your processing never uses a stale value, but nothing more.

2) Atomic:
Atomic operations modify data in a single clock tick, so that it is impossible for ANY other thread to access the data in the middle of such an update. They're usually limited to whatever single-clock assembly instructions the hardware supports; things like ++,--, and swapping 2 pointers. Note that this says nothing about the ORDER the different threads will RUN the atomic instructions, only that they will never run in parallel. That's why you have all those additional options for forcing an ordering.