Hadoop MapReduce Streaming sorting on multiple columns

you can achieve numerical sorting on multiple columns by specifying multiple k options in mapred.text.key.comparator.options (similarly to the linux sort command)

e.g. in bash

sort -k1,1 -k2rn

so for your example it would be

hadoop jar hadoop-streaming-1.2.1.jar \    -Dmapred.text.key.comparator.options='-k1,1 - k2rn' \    -Dmapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator \    -mapper cat \    -reducer cat \    -file mr_base.py \    -file common.py \    -file mr_sort_combiner.py \    -input mr_combiner/2013_12_09__05_47_21/part-* \    -output mr_sort_combiner/2013_12_09__07_15_59/