Sorting in MapReduce Hadoop

1.Assume if 100 mappers were executed and zero reducer. Will it generate 100 files?

Yes.

All individual are sorted?

No. If no reducers are used, then the output of mappers are not sorted. Sorting only takes place when there is a reduce phase.

Across all mapper output are sorted?

No, for the same reason, as above.

2.Input for reducer is Key -> Values. For each key, all values are sorted?

No. However, the keys are sorted. After the shuffling phase, in which the reducer gets the output of the mappers, it merge-sorts the sorted output keys of the mappers (since there IS a reduce phase) and when it starts reducing, the keys are sorted.

3.Assume if 50 reducers were executed. Will it generate 50 files?

Yes. (unless you use MultipleOutputs)

All individual files are sorted?

No. The sorted input does not guarantee a sorted output. The output depends on the algorithm that you use in the reduce method.

Across all reducer output are sorted?

No, for the same reason as above. However, if you use an Identity Reducer, i.e., you just write the input of the reducer as you get it, the reducer's output will be sorted PER REDUCER, not globally.

Is there any place where guaranteed sorting happens in MapReduce?

Sorting takes place when there is a reduce phase and it is applied in the output keys of each mapper and the input keys of each reducer. If you want to globally sort the input of the reducer, you can either use a single reducer, or a TotalOrderPartitioner, which is a bit tricky...

CodeHunter

Sorting in MapReduce Hadoop

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last