Is my spin lock implementation correct and optimal? Is my spin lock implementation correct and optimal? multithreading multithreading

Is my spin lock implementation correct and optimal?


Looks fine to me. Btw, here is the textbook implementation that is more efficient even in the contended case.

void lock(volatile int *exclusion){    while (__sync_lock_test_and_set(exclusion, 1))        while (*exclusion)            ;}


So I'm wondering:

* Is it correct?

In the context mentioned, I would say yes.

* Is it optimal?

That's a loaded question. By reinventing the wheel you are also reinventing a lot of problems that have been solved by other implementations

  • I'd expect a waste loop on failure where you aren't trying to access the lock word.

  • Use of a full barrier in the unlock only needs to have release semantics (that's why you'd use __sync_lock_release, so that you'd get st1.rel on itanium instead of mf, or a lwsync on powerpc, ...). If you really only care about x86 or x86_64 the types of barriers used here or not don't matter as much (but if you where to make the jump to intel's itanium for an HP-IPF port then you wouldn't want this).

  • you don't have the pause() instruction that you'd normally put before your waste loop.

  • when there is contention you want something, semop, or even a dumb sleep in desperation. If you really need the performance that this buys you then the futex suggestion is probably a good one. If you need the performance this buys you bad enough to maintain this code you have a lot of research to do.

Note that there was a comment saying that the release barrier wasn't required. That isn't true even on x86 because the release barrier also serves as an instruction to the compiler to not shuffle other memory accesses around the "barrier". Very much like what you'd get if you used asm ("" ::: "memory" ).

* on compare and swap

On x86 the sync_lock_test_and_set will map to a xchg instruction which has an implied lock prefix. Definitely the most compact generated code (esp. if you use a byte for the "lock word" instead of an int), but no less correct than if you used LOCK CMPXCHG. Use of compare and swap can be used for fancier algorthims (like putting a non-zero pointer to metadata for the first "waiter" into the lockword on failure).


In response to your questions:

  1. Looks ok to me
  2. Assuming the OS supports GCC (and GCC has the functions implemented); this should work on all x86 Operating Systems. The GCC documentation suggests that a warning will be produced if they are not supported on a given platform.
  3. There's nothing x86-64 specific here, so I don't see why not. This can be expanded to cover any architecture that GCC supports, however there maybe more optimal ways of achieving this on non x86 architectures.
  4. You might be slightly better off with using __sync_lock_release() in the unlock() case; as this will decrement the lock and add a memory barrier in a single operation. However, assuming that your assertion that there will rarely be contention; it looks good to me.