Accumulo MapReduce job fails with java.io.EOFException, using AccumuloRowInputFormat Accumulo MapReduce job fails with java.io.EOFException, using AccumuloRowInputFormat hadoop hadoop

Accumulo MapReduce job fails with java.io.EOFException, using AccumuloRowInputFormat


I'm guessing that you have a version mismatch of the Accumulo jars you use to launch the MapReduce job and those that you include for the job itself to use (Mappers/Reducers) via DistributedCache or the libjars CLI option.

Because you specified no ranges, AccumuloInputFormat will automatically fetch all of the Tablet boundaries for your table, and create the same number of RangeInputSplit objects as you have Tablets in the table. This split creation is done in the local JVM (the JVM created when you submit your job). These RangeInputSplit objects are serialized and passed into YARN.

The error you provided is when a Mapper takes one of these serialized RangeInputSplit objects and tries to deserialize it. Some how, this is failing because there is not enough serialized data to deserialize what the version of Accumulo running in the Mapper expects to read.

It's possible that this is just a serialization error in your version of Accumulo (please do share that), but I don't recall hearing about such an error. I would guess that there's a difference in the version of Accumulo on the local classpath and the Mapper's classpath.