NegativeArraySizeException when creating a SequenceFile with large (>1GB) BytesWritable value size
just use ArrayPrimitiveWritable instead.
There is an int overflow by setting new capacity in BytesWritable here:
public void setSize(int size) { if (size > getCapacity()) { setCapacity(size * 3 / 2); } this.size = size;}
700 Mb * 3 > 2Gb = int overflow!
As result you cannot deserialize (but can write and serialize) more than 700 Mb into BytesWritable.
In case you would like to use BytesWritable
, an option is set the capacity high enough before, so you utilize 2GB, not only 700MB:
randomValue.setCapacity(numBytesToWrite);randomValue.setSize(numBytesToWrite); // will not resize now
This bug has fixed in Hadoop recently, so in newer versions it should work even without that:
public void setSize(int size) { if (size > getCapacity()) { // Avoid overflowing the int too early by casting to a long. long newSize = Math.min(Integer.MAX_VALUE, (3L * size) / 2L); setCapacity((int) newSize); } this.size = size;}