What is the relation between 'mapreduce.map.memory.mb' and 'mapred.map.child.java.opts' in Apache Hadoop YARN?

apache hadoop configuration hadoop-yarn heap-size

mapreduce.map.memory.mb is the upper memory limit that Hadoop allows to be allocated to a mapper, in megabytes. The default is 512. If this limit is exceeded, Hadoop will kill the mapper with an error like this:

Container[pid=container_1406552545451_0009_01_000002,containerID=container_234132_0001_01_000001] is running beyond physical memory limits. Current usage: 569.1 MB of 512 MB physical memory used; 970.1 MB of 1.0 GB virtual memory used. Killing container.

Hadoop mapper is a java process and each Java process has its own heap memory maximum allocation settings configured via mapred.map.child.java.opts (or mapreduce.map.java.opts in Hadoop 2+).If the mapper process runs out of heap memory, the mapper throws a java out of memory exceptions:

Error: java.lang.RuntimeException: java.lang.OutOfMemoryError

Thus, the Hadoop and the Java settings are related. The Hadoop setting is more of a resource enforcement/controlling one and the Java is more of a resource configuration one.

The Java heap settings should be smaller than the Hadoop container memory limit because we need reserve memory for Java code. Usually, it is recommended to reserve 20% memory for code. So if settings are correct, Java-based Hadoop tasks should never get killed by Hadoop so you should never see the "Killing container" error like above.

If you experience Java out of memory errors, you have to increase both memory settings.

apache hadoop configuration hadoop-yarn heap-size

The following properties let you specify options to be passed to the JVMs running your tasks. These can be used with -Xmx to control heap available.

Hadoop 0.x, 1.x (deprecated)       Hadoop 2.x-------------------------------    --------------------------mapred.child.java.opts            mapred.map.child.java.opts         mapreduce.map.java.optsmapred.reduce.child.java.opts      mapreduce.reduce.java.opts

Note there is no direct Hadoop 2 equivalent for the first of these; the advice in the source code is to use the other two. mapred.child.java.opts is still supported (but is overridden by the other two more-specific settings if present).

Complementary to these, the following let you limit total memory (possibly virtual) available for your tasks - including heap, stack and class definitions:

Hadoop 0.x, 1.x (deprecated)       Hadoop 2.x-------------------------------    --------------------------mapred.job.map.memory.mb           mapreduce.map.memory.mbmapred.job.reduce.memory.mb        mapreduce.reduce.memory.mb

I suggest setting -Xmx to 75% of the memory.mb values.

In a YARN cluster, jobs must not use more memory than the server-side config yarn.scheduler.maximum-allocation-mb or they will be killed.

To check the defaults and precedence of these, see JobConf and MRJobConfig in the Hadoop source code.

Troubleshooting

Remember that your mapred-site.xml may provide defaults for these settings. This can be confusing - e.g. if your job sets mapred.child.java.opts programmatically, this would have no effect if mapred-site.xml sets mapreduce.map.java.opts or mapreduce.reduce.java.opts. You would need to set those properties in your job instead, to override the mapred-site.xml. Check your job's configuration page (search for 'xmx') to see what values have been applied and where they have come from.

ApplicationMaster memory

In a YARN cluster, you can use the following two properties to control the amount of memory available to your ApplicationMaster (to hold details of input splits, status of tasks, etc):

Hadoop 0.x, 1.x                    Hadoop 2.x-------------------------------    --------------------------                                   yarn.app.mapreduce.am.command-opts                                   yarn.app.mapreduce.am.resource.mb

Again, you could set -Xmx (in the former) to 75% of the resource.mb value.

Other configurations

There are many other configurations relating to memory limits, some of them deprecated - see the JobConf class. One useful one:

Hadoop 0.x, 1.x (deprecated)       Hadoop 2.x-------------------------------    --------------------------mapred.job.reduce.total.mem.bytes  mapreduce.reduce.memory.totalbytes

Set this to a low value (10) to force shuffle to happen on disk in the event that you hit an OutOfMemoryError at MapOutputCopier.shuffleInMemory.

CodeHunter

What is the relation between 'mapreduce.map.memory.mb' and 'mapred.map.child.java.opts' in Apache Hadoop YARN?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last