reuse JVM in Hadoop mapreduce jobs

performance hadoop jvm mapreduce

If you have very small tasks that are definitely running after each other, it is useful to set this property to -1 (meaning that a spawned JVM will be reused unlimited times).So you just spawn (number of task in your cluster available to your job)-JVMs instead of (number of tasks)-JVMs.

This is a huge performance improvement. In long running jobs the percentage of the runtime in comparision to setup a new JVM is very low, so it doesn't give you a huge performance boost.

Also in long running tasks it is good to recreate the task process, because of issues like heap fragmentation degrading your performance.

In addition, if you have some mid-time-running jobs, you could reuse just 2-3 of the tasks, having a good trade-off.

performance hadoop jvm mapreduce

JVM reuse(only possible in MR1) should help with performance because it removes the startup lag of the JVM but it is only marginal and comes with a number of drawbacks(read side effects. Most tasks will run for a long time (tens of seconds or even minutes) and startup times are not the problem when you look at those task run times. You would like to start a new task on a clean slate. When you re-use a JVM there is a chance that the heap is not completely clean(it is fragmented from the previous runs). The fragmentation can lead to more GC's and nullify all the start up time gains. If there is a memory leak it could also affect the memory usage etc. So it's better to start a new JVM for the tasks(if the tasks are not reasonably small). In MR2(YARN) - new JVM is always started for the tasks. For Uber tasks - it will run the task in the local JVM only.

CodeHunter

reuse JVM in Hadoop mapreduce jobs

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last