How to guarantee 64-bit writes are atomic? How to guarantee 64-bit writes are atomic? multithreading multithreading

How to guarantee 64-bit writes are atomic?


Your best bet is to avoid trying to build your own system out of primitives, and instead use locking unless it really shows up as a hot spot when profiling. (If you think you can be clever and avoid locks, don't. You aren't. That's the general "you" which includes me and everybody else.) You should at minimum use a spin lock, see spinlock(3). And whatever you do, don't try to implement "your own" locks. You will get it wrong.

Ultimately, you need to use whatever locking or atomic operations your operating system provides. Getting these sorts of things exactly right in all cases is extremely difficult. Often it can involve knowledge of things like the errata for specific versions of specific processor. ("Oh, version 2.0 of that processor didn't do the cache-coherency snooping at the right time, it's fixed in version 2.0.1 but on 2.0 you need to insert a NOP.") Just slapping a volatile keyword on a variable in C is almost always insufficient.

On Mac OS X, that means you need to use the functions listed in atomic(3) to perform truly atomic-across-all-CPUs operations on 32-bit, 64-bit, and pointer-sized quantities. (Use the latter for any atomic operations on pointers so you're 32/64-bit compatible automatically.) That goes whether you want to do things like atomic compare-and-swap, increment/decrement, spin locking, or stack/queue management. Fortunately the spinlock(3), atomic(3), and barrier(3) functions should all work correctly on all CPUs that are supported by Mac OS X.


On x86_64, both the Intel compiler and gcc support some intrinsic atomic-operation functions. Here's gcc's documentation of them: http://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html

The Intel compiler docs also talk about them here: http://softwarecommunity.intel.com/isn/downloads/softwareproducts/pdfs/347603.pdf (page 164 or thereabouts).


According to Chapter 7 of Part 3A - System Programming Guide of Intel's processor manuals, quadword accesses will be carried out atomically if aligned on a 64-bit boundary, on a Pentium or newer, and unaligned (if still within a cache line) on a P6 or newer. You should use volatile to ensure that the compiler doesn't try to cache the write in a variable, and you may need to use a memory fence routine to ensure that the write happens in the proper order.

If you need to base the value written on an existing value, you should use your operating system's Interlocked features (e.g. Windows has InterlockedIncrement64).