Different ways of configuring the memory to the TaskTracker child process (Mapper and Reduce Tasks)

memory hadoop mapreduce task

-Xmx specify the maximum heap space of the allocated jvm. This is the space reserved for object allocation that is managed by the garbage collector. On the other hand, mapred.job.map.memory.mb specifies the maximum virtual memory allowed by a Hadoop task subprocess. If you exceed the max heap size, the JVM throws an OutOfMemoryException.

The JVM may use more memory than the max heap size because it also needs space to store object definitions (permgen space) and the stack. If the process uses more virtual memory than mapred.job.map.memory.mb it is killed by hadoop.

So one doesn't take precedence over the other (and they measure different aspects of memory usage), but -Xmx is a parameter to the JVM and mapred.job.map.memory.mb is a hard upper-bound of the virtual memory a task attempt can use, enforced by hadoop.

Hope this is helpful, memory is complicated! I'm presently confused by why my JVM processes use several multiples of the max heap size in virtual memory in my SO post.

CodeHunter

Different ways of configuring the memory to the TaskTracker child process (Mapper and Reduce Tasks)

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last