Is there any non-commutative reducer in mapreduce that can be executed in parallel?

algorithm hadoop parallel-processing mapreduce distributed-computing

I don't know if "commutative" is the right word to use here, but I understand what you are saying.

In hadoop, the post-mapping phase is actually divided into two steps: a Combiner and a Reducer, with the same signature. The Combiner runs on mappers to reduce the size of the output before it gets key-sorted and sent to reducers. If you just specify a Reducer, then it will be used for both; but you can separate them to do surprisingly more than you think.

The simple case of doing a counting operation uses a counting reducer, which can be used for both the combine step and the reduce step. This reduces the need to have the same key sent over the wire multiple times.

You can achieve similar efficiency for computing the mean by defining different combiners and reducers. For example, mappers output a value (number, 1) corresponding to a numerical value and a count of 1. The combiner can map a collection of values to either a (sum, count) tuple or a (mean, count) tuple, and the reducer can aggregate these using the counted weights to produce an average. (As an aside: you can greatly reduce error in adding a lot of numbers using Kahan summation). This allows mappers to do some of the combining just as in a simple counting example.

You can do a lot of clever things in a single map-reduce step. However, I don't think this is possible for the median; in that case you will actually have to send all the numbers through the state of one machine.

CodeHunter

Is there any non-commutative reducer in mapreduce that can be executed in parallel?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last