Hadoop streaming job failure: Task process exit with nonzero status of 137 Hadoop streaming job failure: Task process exit with nonzero status of 137 hadoop hadoop

Hadoop streaming job failure: Task process exit with nonzero status of 137


Two possibilities come to mind:

  1. RAM usage, if a tasks uses too much RAM the OS can kill it (after horrible swapping, etc).
  2. Are you using any non-reentrant libraries? Maybe the timer is being triggered at an inopportune point in a library call.


Exit code 137 is a typical sign of the infamous OOM killer. You can easily check it using dmesg command for messages like this:

[2094250.428153] CPU: 23 PID: 28108 Comm: node Tainted: G         C O  3.16.0-4-amd64 #1 Debian 3.16.7-ckt20-1+deb8u2[2094250.428155] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015[2094250.428156]  ffff880773439400 ffffffff8150dacf ffff881328ea32f0 ffffffff8150b6e7[2094250.428159]  ffff881328ea3808 0000000100000000 ffff88202cb30080 ffff881328ea32f0[2094250.428162]  ffff88107fdf2f00 ffff88202cb30080 ffff88202cb30080 ffff881328ea32f0[2094250.428164] Call Trace:[2094250.428174]  [<ffffffff8150dacf>] ? dump_stack+0x41/0x51[2094250.428177]  [<ffffffff8150b6e7>] ? dump_header+0x76/0x1e8[2094250.428183]  [<ffffffff8114044d>] ? find_lock_task_mm+0x3d/0x90[2094250.428186]  [<ffffffff8114088d>] ? oom_kill_process+0x21d/0x370[2094250.428188]  [<ffffffff8114044d>] ? find_lock_task_mm+0x3d/0x90[2094250.428193]  [<ffffffff811a053a>] ? mem_cgroup_oom_synchronize+0x52a/0x590[2094250.428195]  [<ffffffff8119fac0>] ? mem_cgroup_try_charge_mm+0xa0/0xa0[2094250.428199]  [<ffffffff81141040>] ? pagefault_out_of_memory+0x10/0x80[2094250.428203]  [<ffffffff81057505>] ? __do_page_fault+0x3c5/0x4f0[2094250.428208]  [<ffffffff8109d017>] ? put_prev_entity+0x57/0x350[2094250.428211]  [<ffffffff8109be86>] ? set_next_entity+0x56/0x70[2094250.428214]  [<ffffffff810a2c61>] ? pick_next_task_fair+0x6e1/0x820[2094250.428219]  [<ffffffff810115dc>] ? __switch_to+0x15c/0x570[2094250.428222]  [<ffffffff81515ce8>] ? page_fault+0x28/0x30

You can easily if OOM is enabled:

$ cat /proc/sys/vm/overcommit_memory0

Basically OOM killer tries to kill process that eats largest part of memory. And you probably don't want to disable it:

The OOM killer can be completely disabled with the following command. This is not recommended for production environments, because if an out-of-memory condition does present itself, there could be unexpected behavior depending on the available system resources and configuration. This unexpected behavior could be anything from a kernel panic to a hang depending on the resources available to the kernel at the time of the OOM condition.

sysctl vm.overcommit_memory=2echo "vm.overcommit_memory=2" >> /etc/sysctl.conf

Same situation can happen if you use e.g. cgroups for limiting memory. When process exceeds given limit it gets killed without warning.


I got this error. Kill a day and found it was an infinite loop somewhere in the code.