Accumulo MapReduce job fails with java.io.EOFException, using AccumuloRowInputFormat

java hadoop hadoop2 accumulo

I'm guessing that you have a version mismatch of the Accumulo jars you use to launch the MapReduce job and those that you include for the job itself to use (Mappers/Reducers) via DistributedCache or the libjars CLI option.

Because you specified no ranges, AccumuloInputFormat will automatically fetch all of the Tablet boundaries for your table, and create the same number of RangeInputSplit objects as you have Tablets in the table. This split creation is done in the local JVM (the JVM created when you submit your job). These RangeInputSplit objects are serialized and passed into YARN.

The error you provided is when a Mapper takes one of these serialized RangeInputSplit objects and tries to deserialize it. Some how, this is failing because there is not enough serialized data to deserialize what the version of Accumulo running in the Mapper expects to read.

It's possible that this is just a serialization error in your version of Accumulo (please do share that), but I don't recall hearing about such an error. I would guess that there's a difference in the version of Accumulo on the local classpath and the Mapper's classpath.

CodeHunter

Accumulo MapReduce job fails with java.io.EOFException, using AccumuloRowInputFormat

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last