"GC Overhead limit exceeded" on Hadoop .20 datanode "GC Overhead limit exceeded" on Hadoop .20 datanode hadoop hadoop

"GC Overhead limit exceeded" on Hadoop .20 datanode


Try to increase the memory for datanode by using this: (hadoop restart required for this to work)

export HADOOP_DATANODE_OPTS="-Xmx10g"

This will set the heap to 10gb...you can increase as per your need.

You can also paste this at the start in $HADOOP_CONF_DIR/hadoop-env.sh file.


If you are running a map reduce job from command line, you can increase the heap using the parameter -D 'mapreduce.map.java.opts=-Xmx1024m' and/or -D 'mapreduce.reduce.java.opts=-Xmx1024m'. Example:

hadoop --config /etc/hadoop/conf jar /usr/lib/hbase-solr/tools/hbase-indexer-mr-*-job.jar --conf /etc/hbase/conf/hbase-site.xml -D 'mapreduce.map.java.opts=-Xmx1024m' --hbase-indexer-file $HOME/morphline-hbase-mapper.xml --zk-host 127.0.0.1/solr --collection hbase-collection1 --go-live --log4j /home/cloudera/morphlines/log4j.properties

Note that in some Cloudera documentation, they still use the old parameters mapred.child.java.opts, mapred.map.child.java.opts and mapred.reduce.child.java.opts. These parameters don't work anymore for Hadoop 2 (see What is the relation between 'mapreduce.map.memory.mb' and 'mapred.map.child.java.opts' in Apache Hadoop YARN?).


This post solved the issue for me.

So, the key is to "Prepend that environment variable" (1st time seen this linux command syntax :) )

HADOOP_CLIENT_OPTS="-Xmx10g" hadoop jar "your.jar" "source.dir" "target.dir"