Different ways of configuring the memory to the TaskTracker child process (Mapper and Reduce Tasks) Different ways of configuring the memory to the TaskTracker child process (Mapper and Reduce Tasks) hadoop hadoop

Different ways of configuring the memory to the TaskTracker child process (Mapper and Reduce Tasks)


-Xmx specify the maximum heap space of the allocated jvm. This is the space reserved for object allocation that is managed by the garbage collector. On the other hand, mapred.job.map.memory.mb specifies the maximum virtual memory allowed by a Hadoop task subprocess. If you exceed the max heap size, the JVM throws an OutOfMemoryException.

The JVM may use more memory than the max heap size because it also needs space to store object definitions (permgen space) and the stack. If the process uses more virtual memory than mapred.job.map.memory.mb it is killed by hadoop.

So one doesn't take precedence over the other (and they measure different aspects of memory usage), but -Xmx is a parameter to the JVM and mapred.job.map.memory.mb is a hard upper-bound of the virtual memory a task attempt can use, enforced by hadoop.

Hope this is helpful, memory is complicated! I'm presently confused by why my JVM processes use several multiples of the max heap size in virtual memory in my SO post.