Handling Writables fully qualified name changes in Hadoop SequenceFile Handling Writables fully qualified name changes in Hadoop SequenceFile hadoop hadoop

Handling Writables fully qualified name changes in Hadoop SequenceFile


Looking at the spec for sequencefile it seems clear there isn't any consideration for alternative class names.

If I wasn't in a position to re-write the data, one more option is to have com.mammals.fishes.writable extend com.vertebrates.fishes.writable and just annotate it as deprecated so nobody accidentally adds code to the empty wrapper. After a long enough time, the data written with the old class will be obsoleted and you'll be able to safely delete the mammals class.


The org.apache.hadoop.io.WritableName class mentioned in the exception stack trace has some useful methods.

From the doc:

Utility to permit renaming of Writable implementation classes without invalidiating files that contain their class name.

// Add an alternate name for a class.public static void addName(Class writableClass, String name)

In your case you could call this before reading from your SequenceFiles:

WritableName.addName(com.vertebrates.fishes.FishWritable.class, "com.mammals.fishes.FishWritable");

This way, when attempting to read a com.mammals.fishes.FishWritable from an old SequenceFile, the new com.vertebrates.fishes.FishWritable class will be used.

PS: Why was the fish in the mammals package in the first place? ;)