java code execution yields to different results in debug without breakpoints and normal run. Is ExecutorService broken? java code execution yields to different results in debug without breakpoints and normal run. Is ExecutorService broken? multithreading multithreading

java code execution yields to different results in debug without breakpoints and normal run. Is ExecutorService broken?


TL;DR HotSpot JIT optimization to eliminate safepoint checks in counted loops played a sick joke.

This is a very interesting question: a simple Java test reveals a non-trivial problem deep inside JVM.

The fact that a thread dump hasn't appeared immediately indicates that the problem is not in Java code, but is somehow related to JVM. Thread dumps are printed at safepoints. The delay means that VM wasn't able to reach a safepoint in short time.

Background

Certain VM operations (GC, Deoptimization, Thread dump and some other) are executed at stop-the-world pauses when no Java threads are running. But Java threads could not be stopped at any arbitrary point, they can pause only at certain places called safepoints. In JIT-compiled code the safepoints are usually placed at method exits and at backward branches, i.e. inside loops.

The safepoint checks are relatively cheap in terms of perfomance, but not free. That's why JIT compiler tries to reduce number of safepoints where possible. One such optimization is to eliminate safepoint checks in counted loops, i.e. loops with an integer counter that are known to have finite number of iterations.

Verify the theory

Let's get back to our test and check if safepoints are reached in time.

Add -XX:+SafepointTimeout -XX:SafepointTimeoutDelay=1000 JVM options. This should print a debug message whenever VM fails to reach a safepoint in 1000 ms.

# SafepointSynchronize::begin: Timeout detected:# SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.# SafepointSynchronize::begin: Threads which did not reach the safepoint:# "pool-1-thread-2" #12 prio=5 os_prio=0 tid=0x0000000019004800 nid=0x1480 runnable [0x0000000000000000]   java.lang.Thread.State: RUNNABLE# SafepointSynchronize::begin: (End of list)

Yes, it printed that the thread pool-1-thread-2 could not stop in 1000 ms. This is the second thread of the Executor's pool, which should run insertionSort algorithm.

insertionSort has two very long nested counted loops, and it looks like JVM has eliminated safepoint checks inside them. So, if this method runs in compiled mode, JVM cannot stop it until the method completes. If a stop-the-world pause is requested while the method is running, all other threads will also wait.

What to do?

This issue has been known for a long time. Here is the related JVM bug: JDK-5014723. It's not a high-priority one, since the problem rarely appears in real-life applications.

A new JVM flag appeared in JDK 8u92 to work around the issue.
-XX:+UseCountedLoopSafepoints will always put safepoint checks inside loops.

Another solution would be to transform a long counted loop into generic one by modifying a counter variable inside the loop.

E.g. if you replace if (breaker) break; with if (breaker) j = 0; the problem will also disappear.

Why does it work in debug mode then?

When JVM is started with the debugger on, some JIT optimizations are disabled to make the debug info available. In this case the compiled code has all the safepoint checks.