NegativeArraySizeException when creating a SequenceFile with large (>1GB) BytesWritable value size NegativeArraySizeException when creating a SequenceFile with large (>1GB) BytesWritable value size hadoop hadoop

NegativeArraySizeException when creating a SequenceFile with large (>1GB) BytesWritable value size


just use ArrayPrimitiveWritable instead.

There is an int overflow by setting new capacity in BytesWritable here:

public void setSize(int size) {    if (size > getCapacity()) {       setCapacity(size * 3 / 2);    }    this.size = size;}

700 Mb * 3 > 2Gb = int overflow!

As result you cannot deserialize (but can write and serialize) more than 700 Mb into BytesWritable.


In case you would like to use BytesWritable, an option is set the capacity high enough before, so you utilize 2GB, not only 700MB:

randomValue.setCapacity(numBytesToWrite);randomValue.setSize(numBytesToWrite); // will not resize now

This bug has fixed in Hadoop recently, so in newer versions it should work even without that:

public void setSize(int size) {  if (size > getCapacity()) {    // Avoid overflowing the int too early by casting to a long.    long newSize = Math.min(Integer.MAX_VALUE, (3L * size) / 2L);    setCapacity((int) newSize);  }  this.size = size;}