MapReduce (Python) - How to sort reducer output for Top-N list? MapReduce (Python) - How to sort reducer output for Top-N list? hadoop hadoop

MapReduce (Python) - How to sort reducer output for Top-N list?


I believe you can use the collections.Counter class here:

Example: (modified from your code)

#!/usr/bin/pythonimport sysimport collectionscounter = collections.Counter()for line in sys.stdin:    k, v = line.strip().split("\t", 2)    counter[k] += int(v)print counter.most_common(10)

The collections.Counter() class implements this exact use-case and many other common use-cases around counting things and collecting various stats, etc.

8.3.1. Counter objects A counter tool is provided to support convenient and rapid tallies. For example: