Java fork/join framework logic Java fork/join framework logic multithreading multithreading

Java fork/join framework logic


When you use an ExecutorService you will decide how many threads will be in the thread pool, and there is no kind of distinction between the tasks that you schedule and the subtasks that these tasks create.
ForkJoinPool class instead, manages threads based on 1)available processors and 2)task demand.
In this case, the subtasks created by the active tasks are being scheduled by different methods than the external tasks.
We typically have one fork-join pool for an entire application (unlike using the ExecutorService where it is typical to have more than 1 in any non-trivial application) and there is no need for shutdown.
I haven't reviewed the internals to give you a more low level explanation but if you see here there is a presentation and a benchmark showing measurements displaying the parallelism that is promised.

Update:
This framework addresses specific kind of problems (ExecutorService works better for tasks that have a mix of CPU and I/O activity).
The basic thinking here, is to use a recursive/divide and conquer approach in order to keep CPUs constantly busy. The idea is to create new tasks (forking) and suspend the current task until the new tasks complete (join) but without creating new threads and without having a shared work queue.
So Fork-join framework is implemented using work-stealing by creating a limited number of worker threads(as many as cores). Each worker thread maintains a private double-ended work queue.
When forking, worker pushes new task at the head of its deque. When waiting or idle, worker pops a task off the head of its deque and executes it instead of sleeping.
If worker’s deque is empty, steals an element off the tail of the deque of another randomly chosen worker.
I would recomend to read Data Parallelism in Java and also do some benchmarks yourself to be convinced. Theory is good only up to a point. After that do your measurements to see if there is significant performance edge or not


Let me start with an article [yes I wrote it] that critiques the framework. A Java Fork-Join Calamity

Now to your questions:

  1. It's not. The framework wants to process a DAG. That's the design structure.

  2. It's not. As the article mentions, Java applications know nothing about caches, memory etc. so the assumptions are erroneous.

  3. Yes. That is exactly what happens. Stalls are so common that the framework needs to create "continuation threads" to keep moving. The article references a question here where over 700 continuation threads were needed.

  4. I certainly agree that the code is complex. Scatter-gather works much better than work-stealing for applications. As far as documentation, what documentation? There are no details from Oracle. Its all a push to use the framework.

There are alternatives.