Hadoop MapReduce Streaming sorting on multiple columns
you can achieve numerical sorting on multiple columns by specifying multiple k options in mapred.text.key.comparator.options (similarly to the linux sort command)
e.g. in bash
sort -k1,1 -k2rn
so for your example it would be
hadoop jar hadoop-streaming-1.2.1.jar \ -Dmapred.text.key.comparator.options='-k1,1 - k2rn' \ -Dmapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator \ -mapper cat \ -reducer cat \ -file mr_base.py \ -file common.py \ -file mr_sort_combiner.py \ -input mr_combiner/2013_12_09__05_47_21/part-* \ -output mr_sort_combiner/2013_12_09__07_15_59/