Hadoop Spill failure Hadoop Spill failure hadoop hadoop

Hadoop Spill failure


Ok, all problems are solved.

The Map-Reduce serialization operation needs intern a default constructor for org.apache.hadoop.io.ArrayWritable.
Hadoops implementation didn't provide a default constructor for ArrayWritable.
That's why the java.lang.NoSuchMethodException: org.apache.hadoop.io.ArrayWritable.() was thrown and caused the weird spill exception.

A simple wrapper made ArrayWritable really writable and fixed it! Strange that Hadoop did not provide this.


This problem came up for me when the output of one of my map jobs produced a tab character ("\t") or newline character ("\r" or "\n") - Hadoop doesn't handle this well and fails. I was able to solve this using this piece of Python code:

if "\t" in output:  output = output.replace("\t", "")if "\r" in output:  output = output.replace("\r", "")if "\n" in output:  output = output.replace("\n", "")

You may have to do something else for your app.