Can compiler sometimes cache variable declared as volatile Can compiler sometimes cache variable declared as volatile multithreading multithreading

Can compiler sometimes cache variable declared as volatile


But compiler in principal should not cache a volatile variable, right?

No, the compiler in principle must read/write the address of the variable each time you read/write the variable.

[Edit: At least, it must do so up to the point at which the the implementation believes that the value at that address is "observable". As Dietmar points out in his answer, an implementation might declare that normal memory "cannot be observed". This would come as a surprise to people using debuggers, mprotect, or other stuff outside the scope of the standard, but it could conform in principle.]

In C++03, which does not consider threads at all, it is up to the implementation to define what "accessing the address" means when running in a thread. Details like this are called the "memory model". Pthreads, for example, allows per-thread caching of the whole of memory, including volatile variables. IIRC, MSVC provides a guarantee that volatile variables of suitable size are atomic, and it will avoid caching (rather, it will flush as far as a single coherent cache for all cores). The reason it provides that guarantee is because it's reasonably cheap to do so on Intel -- Windows only really cares about Intel-based architectures, whereas Posix concerns itself with more exotic stuff.

C++11 defines a memory model for threading, and it says that this is a data race (i.e. that volatile does not ensure that a read in one thread is sequenced relative to a write in another thread). Two accesses can be sequenced in a particular order, sequenced in unspecified order (the standard might say "indeterminate order", I can't remember), or not sequenced at all. Not sequenced at all is bad -- if either of two unsequenced accesses is a write then behavior is undefined.

The key here is the implied "and then" in "I modify an element from a thread AND THEN the thread reading it does not notice the change". You're assuming that the operations are sequenced, but they're not. As far as the reading thread is concerned, unless you use some kind of synchronization the write in the other thread hasn't necessarily happened yet. And actually it's worse than that -- you might think from what I just wrote that it's only the order of operations that is unspecified, but actually the behavior of a program with a data race is undefined.


C

What volatile does:

  • Guarantees an up-to-date value in the variable, if the variable is modified from an external source (a hardware register, an interrupt, a different thread, a callback function etc).
  • Blocks all optimizations of read/write access to the variable.
  • Prevent dangerous optimization bugs that can happen to variables shared between several threads/interrupts/callback functions, when the compiler does not realize that the thread/interrupt/callback is called by the program. (This is particularly common among various questionable embedded system compilers, and when you get this bug it is very hard to track down.)

What volatile does not:

  • It does not guarantee atomic access or any form of thread-safety.
  • It cannot be used instead of a mutex/semaphore/guard/critical section. It cannot be used for thread synchronization.

What volatile may or may not do:

  • It may or may not be implemented by the compiler to provide a memory barrier, to protect against instruction cache/instruction pipe/instruction re-ordering issues in a multi-core environment. You should never assume that volatile does this for you, unless the compiler documentation explicitly states that it does.


With volatile you can only impose that a variable is re-read whenever you use its value. It doesn't guarantee that the different values/representations that are present on different levels of your architecture are consistent.

To have such gurantees you'd need the new utilities from C11 and C++1 concerning atomic access and memory barriers. Many compilers implement these already in terms of extension. E.g the gcc family (clang, icc, etc) have builtins starting with prefix __sync to implement these.