Happens-before for direct ByteBuffer Happens-before for direct ByteBuffer multithreading multithreading

Happens-before for direct ByteBuffer


Certainly if you read and write the ByteBuffer in Java code, using Java methods such as put and get, then the happens-before relationship between your modifications on the first thread, publishing/consumption, and finally subsequent access on the second thread will apply0 in the expected way. After all the fact that the ByteBuffer is backed by "off heap" memory is just an implementation detail: it doesn't allow the Java methods on ByteBuffer to break the memory model contract.

Things get a bit hazy if you are talking about writes to this byte buffer from native code you call through JNI or another mechanism. I think as long as you are using normal stores (i.e., not non-temporal stores or anything which has weak semantics than normal stores) in your native code, you will be fine in practice. After all the JMV internally implements stores to heap memory via the same mechanism, and in particular the get and put-type methods will be implemented with normal loads and stores. The publishing action, which generally involves some type of release-store will apply to all prior Java actions and also the stores inside your native code.

You can find some expert discussion on the concurrency mailing lists of more or less this topic. The precise question there is "Can I use Java locks to protect a buffer accessed only by native code", but the underlying concerns are pretty much the same. The conclusion seems consistent with the above: if you are safe if you do normal loads and stores to a normal1 memory area. If you want to use weaker instructions you'll need a fence.


0 So that was a bit of a lengthy, tortured sentence, but I wanted to make it clear that there is a whole chain of happens-before pairs that have to be correctly synchronized for this to work: (A) between the writes to the buffer and the publishing store on the first thread , (B) the publishing store and the consuming load (C) the consuming load and the subsequent reads or writes by the second thread. The pair (B) is purely in Java-land so follows the regular rules. The question is then mostly about whether (A) and (C), which have one "native" element, are also fine.

1 Normal in this context more or less means the same type of memory area that Java uses, or at least one with as-strong consistency guarantees with respect to the type of memory Java uses. You have to go out of your way to violate this, and because you are using ByteBuffer you already know the area is allocated by Java and has to play by the normal rules (since the Java-level methods on the ByteBuffer need to work in a way consistent with the memory model, at least).


The Java object monitor's happens-before order semantics are described in §17.4.5 as:

The wait methods of class Object (§17.2.1) have lock and unlock actions associated with them; their happens-before relationships are defined by these associated actions.

It is unspecified whether that applies to Java-managed objects only or to any data. After all, Java doesn't care about what happens outside the Java "world". But it also means we can extrapolate the spec to any data reachable inside the Java world. Then the relation to the heap becomes less important. After all, if I synchronize the threads, why shouldn't it work for a direct ByteBuffer?

To confirm this we can take a look at how it is actually implemented in the OpenJDK.

If we look closely we see that ObjectMonitor::wait, among other things does:

    OrderAccess::fence();

And ObjectMonitor::exit (the business end of notify/notifyAll) does:

    OrderAccess::release_store_ptr (&_owner, NULL) ;    OrderAccess::storeload() ;

Both fence() and storeload() result in a global StoreLoad memory fence:

inline void OrderAccess::storeload()  { fence(); }

On SPARC it generates the membar instruction:

  __asm__ volatile ("membar  #StoreLoad" : : :);

And on x86 it goes to membar(Assembler::StoreLoad) and subsequently:

  // Serializes memory and blows flags  void membar(Membar_mask_bits order_constraint) {    if (os::is_MP()) {      // We only have to handle StoreLoad      if (order_constraint & StoreLoad) {        // All usable chips support "locked" instructions which suffice        // as barriers, and are much faster than the alternative of        // using cpuid instruction. We use here a locked add [esp],0.        // This is conveniently otherwise a no-op except for blowing        // flags.        // Any change to this code may need to revisit other places in        // the code where this idiom is used, in particular the        // orderAccess code.        lock();        addl(Address(rsp, 0), 0);// Assert the lock# signal here      }    }  }

So there you have it, it's just a memory barrier at CPU level. Reference counting and garbage collection come into play at a much higher level.

Which means that at least in OpenJDK, any memory write issued before Object.notify will be sequenced before any read issued after Object.wait.