Which key class is suitable for secondary sort? Which key class is suitable for secondary sort? hadoop hadoop

Which key class is suitable for secondary sort?


I was running into this situation all the time and getting tired of writing custom composite key classes. I wrote a generic Tuple class which is a list of objects and can act as a composite key. The list may contain arbitrary number of objects of Java primitive wrapper types. It implements WritableComparable. The source can be viewed here

https://github.com/pranab/chombo/blob/master/src/main/java/org/chombo/util/Tuple.java


I am not able to understand the question. I do have a working copy SecondarySort, which prints the max value from the list of values.

https://github.com/kapild/hadoop-examples/tree/master/src/SecondarySort


You need to change the way keys repartitioned and grouped, and thisbasicakly means that you put more than 1 data type in keys, whole overriding the comparator method for partitioning and grouping....

-You can serialize/deserialize your keys, and deal with input data as objects or beans if you want strongly typed , robust code for secondary sorting...

-for simpler scenarios, just put a "#" sign between the values!

There is a great high level article on this here :

http://pkghosh.wordpress.com/2011/04/13/map-reduce-secondary-sort-does-it-all/