Copy operations in shuffle and sort phase of MapReduce

hadoop mapreduce bigdata hadoop2

Suppose you have 3 mappers and 1 reducer. Each mapper task outputs 1 file (sorted by key) that is written to the local filesystem of where the map function ran from. So, we will have 3 such output files spread around the cluster.

Since reducers do not take advantage of data locality optimisation, and since we have only 1 reducer - it will need to copy the 3 different output files that each mapper task produced across the network.

Hence, there are m x n = 3 x 1 = 3 copy operations involved in this scenario.

CodeHunter

Copy operations in shuffle and sort phase of MapReduce

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last