Hadoop reduce become slower when there are less reduce task

dictionary hadoop configuration reduce shuffle

This is not a problem. The more reduce tasks you have, the faster your data gets processed.

The outputs of the map phase are sent to the reducers . If you have two reducers, the load is distributed between the two reducers.

Incase of the wordcount example, you will have two seperate files with count divided between them. So you will have to manually add the total, or run another map reduce job to calculate the total if you had lots of reduce tasks.

dictionary hadoop configuration reduce shuffle

This is as expected, if you only have a single reducer than your job has a single point of failure. Your number of reducers should be set to about 90% capacity. You can find your reduce capacity by multiplying your number of reduce slots with your total number of nodes. I have found that it is also good practice to use a combiner if it is applicable.

dictionary hadoop configuration reduce shuffle

If you have just 1 reduce task, then that reducer has to wait for all mappers to finish, and the shuffle phase has to collect all intermediate data to be redirected to just that one reducer. So, it's natural that the map and shuffle times are larger, and so is the overall time, if you have just one reducer.

However if you have more reducers, your data gets processed in parallel, and that makes it faster. Again, if you have too many reducers, then there's too much data being shuffled around, resulting in increase in network traffic. So you have to find that optimal number of reducers which gives you a good balance.

CodeHunter

Hadoop reduce become slower when there are less reduce task

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last