C++ shared_mutex implementation C++ shared_mutex implementation multithreading multithreading

C++ shared_mutex implementation


(CAS = Compare And Swap = C++ compare_exchange_weak function, which on x86 will typically compile to an lock cmpxchg instruction which can only run when it owns the cache line in Exclusive or Modified MESI state).


lock_shared looks good: spinning read-only attempting a CAS only when it looks possible is better for performance than spinning on CAS or atomic increment. You already needed to do a read-only check to avoid changing -1 to 0 and unlocking a write-lock.

On x86, put a _mm_pause() in the retry path of the spin loop to avoid memory-order mis-speculation pipeline nukes when exiting the read-only spin loop, and steal fewer resources from the other hyperthread while spinning. (Use a while() loop, not do{}while(), so the pause only runs after failing once. pause on Skylake and later waits about 100 cycles so avoid it in the fast-path.)


I think unlock_shared should be using mo_release, not mo_relaxed, since it needs to order the loads from the shared data structure to make sure a writer doesn't start writing before the loads from the reader critical section happen. (LoadStore reordering is a thing on weakly-ordered architectures, even though x86 only does StoreLoad reordering.) A Release operation will order preceding loads and keep them inside the critical section.


(in write lock): // can memory_order_relaxed be used if only a single thread takes write locks ?

No, you still need to keep the writes inside the critical section, so the CAS still needs to Synchronize With (in C++ terminology) release-stores from unlock_shared.

https://preshing.com/20120913/acquire-and-release-semantics/ has a nice image that shows the 1-way barrier effect of a release-store or acquire-load.