Available reducers in Elastic MapReduce Available reducers in Elastic MapReduce hadoop hadoop

Available reducers in Elastic MapReduce


The reducer they refer to is documented here:

http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/lib/aggregate/package-summary.html

That's a reducer that is built into the streaming utility. It provides a simple way of doing common calculation by writing a mapper that output keys that are formatted in a special way.

For example, if your mapper outputs:

LongValueSum:id1\t12LongValueSum:id1\t13LongValueSum:id2\t1UniqValueCount:id3\tval1UniqValueCount:id3\tval2

The reducer will calculate the sum of each LongValueSum, and count the distinct values for UniqValueCount. The reducer output will therefore be:

id1\t25id2\t12id3\t2

The reducers and combiners in this package are very fast compared to running streaming combiners and reducers, so using the aggregate package is both convenient and fast.


I'm in a similar situation. I infer from Google results etc that the answer right now is "No, there are no other default reducers in Hadoop", which kind of sucks, because it would be obviously useful to have default reducers like, say, "average" or "median" so you don't have to write your own.

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/aggregate/package-summary.html shows a number of useful aggregator uses but I cannot find documentation for how to access other functionality than the very basic key/value sum described in the documentation and in Erik Forsberg's answer. Perhaps this functionality is only exposed in the Java API, which I don't want to use.

Incidentally, I'm afraid Erik Forsberg's answer is not a good answer to this particular question. Another question for which it could be a useful answer can be constructed, but it is not what the OP is asking.