Hadoop streaming "GC overhead limit exceeded" Hadoop streaming "GC overhead limit exceeded" hadoop hadoop

Hadoop streaming "GC overhead limit exceeded"


It took a while, but I found the solution here.

Prepending HADOOP_CLIENT_OPTS="-Xmx1024M" to the command solves the problem.

The final commandline is:

HADOOP_CLIENT_OPTS="-Xmx1024M" hadoop jar hadoop-streaming.jar -D stream.tmpdir=/tmp -input "<input dir>"  -output "<output dir>" -mapper "grep 20151026" -reducer "wc -l"