Synchronizing on String objects in Java Synchronizing on String objects in Java multithreading multithreading

Synchronizing on String objects in Java


Without putting my brain fully into gear, from a quick scan of what you say it looks as though you need to intern() your Strings:

final String firstkey = "Data-" + email;final String key = firstkey.intern();

Two Strings with the same value are otherwise not necessarily the same object.

Note that this may introduce a new point of contention, since deep in the VM, intern() may have to acquire a lock. I have no idea what modern VMs look like in this area, but one hopes they are fiendishly optimised.

I assume you know that StaticCache still needs to be thread-safe. But the contention there should be tiny compared with what you'd have if you were locking on the cache rather than just the key while calling getSomeDataForEmail.

Response to question update:

I think that's because a string literal always yields the same object. Dave Costa points out in a comment that it's even better than that: a literal always yields the canonical representation. So all String literals with the same value anywhere in the program would yield the same object.

Edit

Others have pointed out that synchronizing on intern strings is actually a really bad idea - partly because creating intern strings is permitted to cause them to exist in perpetuity, and partly because if more than one bit of code anywhere in your program synchronizes on intern strings, you have dependencies between those bits of code, and preventing deadlocks or other bugs may be impossible.

Strategies to avoid this by storing a lock object per key string are being developed in other answers as I type.

Here's an alternative - it still uses a singular lock, but we know we're going to need one of those for the cache anyway, and you were talking about 50 threads, not 5000, so that may not be fatal. I'm also assuming that the performance bottleneck here is slow blocking I/O in DoSlowThing() which will therefore hugely benefit from not being serialised. If that's not the bottleneck, then:

  • If the CPU is busy then this approach may not be sufficient and you need another approach.
  • If the CPU is not busy, and access to server is not a bottleneck, then this approach is overkill, and you might as well forget both this and per-key locking, put a big synchronized(StaticCache) around the whole operation, and do it the easy way.

Obviously this approach needs to be soak tested for scalability before use -- I guarantee nothing.

This code does NOT require that StaticCache is synchronized or otherwise thread-safe. That needs to be revisited if any other code (for example scheduled clean-up of old data) ever touches the cache.

IN_PROGRESS is a dummy value - not exactly clean, but the code's simple and it saves having two hashtables. It doesn't handle InterruptedException because I don't know what your app wants to do in that case. Also, if DoSlowThing() consistently fails for a given key this code as it stands is not exactly elegant, since every thread through will retry it. Since I don't know what the failure criteria are, and whether they are liable to be temporary or permanent, I don't handle this either, I just make sure threads don't block forever. In practice you may want to put a data value in the cache which indicates 'not available', perhaps with a reason, and a timeout for when to retry.

// do not attempt double-check locking here. I mean it.synchronized(StaticObject) {    data = StaticCache.get(key);    while (data == IN_PROGRESS) {        // another thread is getting the data        StaticObject.wait();        data = StaticCache.get(key);    }    if (data == null) {        // we must get the data        StaticCache.put(key, IN_PROGRESS, TIME_MAX_VALUE);    }}if (data == null) {    // we must get the data    try {        data = server.DoSlowThing(key);    } finally {        synchronized(StaticObject) {            // WARNING: failure here is fatal, and must be allowed to terminate            // the app or else waiters will be left forever. Choose a suitable            // collection type in which replacing the value for a key is guaranteed.            StaticCache.put(key, data, CURRENT_TIME);            StaticObject.notifyAll();        }    }}

Every time anything is added to the cache, all threads wake up and check the cache (no matter what key they're after), so it's possible to get better performance with less contentious algorithms. However, much of that work will take place during your copious idle CPU time blocking on I/O, so it may not be a problem.

This code could be commoned-up for use with multiple caches, if you define suitable abstractions for the cache and its associated lock, the data it returns, the IN_PROGRESS dummy, and the slow operation to perform. Rolling the whole thing into a method on the cache might not be a bad idea.


Synchronizing on an intern'd String might not be a good idea at all - by interning it, the String turns into a global object, and if you synchronize on the same interned strings in different parts of your application, you might get really weird and basically undebuggable synchronization issues such as deadlocks. It might seem unlikely, but when it happens you are really screwed. As a general rule, only ever synchronize on a local object where you're absolutely sure that no code outside of your module might lock it.

In your case, you can use a synchronized hashtable to store locking objects for your keys.

E.g.:

Object data = StaticCache.get(key, ...);if (data == null) {  Object lock = lockTable.get(key);  if (lock == null) {    // we're the only one looking for this    lock = new Object();    synchronized(lock) {      lockTable.put(key, lock);      // get stuff      lockTable.remove(key);    }  } else {    synchronized(lock) {      // just to wait for the updater    }    data = StaticCache.get(key);  }} else {  // use from cache}

This code has a race condition, where two threads might put an object into the lock table after each other. This should however not be a problem, because then you only have one more thread calling the webservice and updating the cache, which shouldn't be a problem.

If you're invalidating the cache after some time, you should check whether data is null again after retrieving it from the cache, in the lock != null case.

Alternatively, and much easier, you can make the whole cache lookup method ("getSomeDataByEmail") synchronized. This will mean that all threads have to synchronize when they access the cache, which might be a performance problem. But as always, try this simple solution first and see if it's really a problem! In many cases it should not be, as you probably spend much more time processing the result than synchronizing.


Strings are not good candidates for synchronization. If you must synchronize on a String ID, it can be done by using the string to create a mutex (see "synchronizing on an ID"). Whether the cost of that algorithm is worth it depends on whether invoking your service involves any significant I/O.

Also:

  • I hope the StaticCache.get() and set() methods are threadsafe.
  • String.intern() comes at a cost (one that varies between VM implementations) and should be used with care.