Lock-free stack - Is this a correct usage of c++11 relaxed atomics? Can it be proven? Lock-free stack - Is this a correct usage of c++11 relaxed atomics? Can it be proven? multithreading multithreading

Lock-free stack - Is this a correct usage of c++11 relaxed atomics? Can it be proven?


push is broken, since you do not update node->_next after a compareAndSwap failure. It's possible that the node you originally stored with node->setNext has been popped from the top of stack by another thread when the next compareAndSwap attempt succeeds. As a result, some thread thinks it has popped a node from the stack but this thread has put it back in the stack. It should be:

void push(Node* node) noexcept{    Node* n = _head.next();    do {        node->setNext(n);    } while (!_head.compareAndSwap(n, node));}

Also, since next and setNext use memory_order_relaxed, there's no guarantee that _head_.next() here is returning the node most recently pushed. It's possible to leak nodes from the top of the stack. The same problem obviously exists in pop as well: _head.next() may return a node that was previously but is no longer at the top of the stack. If the returned value is nullptr, you may fail to pop when the stack is not actually empty.

pop can also have undefined behavior if two threads try to pop the last node from the stack at the same time. They both see the same value for _head.next(), one thread successfully completes pop. The other thread enters the while loop - since the observed node pointer is not nullptr - but the compareAndSwap loop soon updates it to nullptr since the stack is now empty. On the next iteration of the loop, that nullptr is dererenced to get its _next pointer and much hilarity ensues.

pop is also clearly suffering from ABA. Two threads can see the same node at the top of the stack. Say one thread gets to the point of evaluating the _next pointer and then blocks. The other thread successfully pops the node, pushes 5 new nodes, and then pushes that original node again all before the other thread wakes. That other thread's compareAndSwap will succeed - the top-of-stack node is the same - but store the old _next value into _head instead of the new one. The five nodes pushed by the other thread are all leaked. This would be the case with memory_order_seq_cst as well.


Leaving to one side the difficulty of implementing the pop operation, I think memory_order_relaxed is inadequate. Before pushing the node, one assumes that some value(s) will be written into to it, which will be read when the node is popped. You need some synchronization mechanism to ensure that the values have actually been written before they are read. memory_order_relaxed is not providing that synchronization... memory_order_acquire/memory_order_release would.


This code is completely broken.

The only reason this appears to work is that current compilers aren't very aggressive with reordering across atomic operations and x86 processors have pretty strong guarantees.

The first problem is that without synchronization, there is no guarantee that the client of this data structure will even see the fields of the node object to be initialized. The next issue is that without synchronization, the push operation can read arbitrarily old values for the head's tag.

We have developed a tool, CDSChecker, that simulates most behaviors that the memory model allows. It is open source and free. Run it on your data structure to see some interesting executions.

Proving anything about code that utilizes relaxed atomics is a big challenge at this point. Most proof methods break down because they are typically inductive in nature, and you don't have an order to induct on. So you get out of thin air read issues...