How many mappers/reducers should be set when configuring Hadoop cluster?

map hadoop reduce

There is no formula. It depends on how many cores and how much memory do you have. The number of mapper + number of reducer should not exceed the number of cores in general. Keep in mind that the machine is also running Task Tracker and Data Node daemons. One of the general suggestion is more mappers than reducers. If I were you, I would run one of my typical jobs with reasonable amount of data to try it out.

map hadoop reduce

Quoting from "Hadoop The Definite Guide, 3rd edition", page 306

Because MapReduce jobs are normally I/O-bound, it makes sense to have more tasks than processors to get better utilization.
The amount of oversubscription depends on the CPU utilization of jobs you run, but a good rule of thumb is to have a factor of between one and two more tasks (counting both map and reduce tasks) than processors.

A processor in the quote above is equivalent to one logical core.

But this is just in theory, and most likely each use case is different than another, some tests need to be performed. But this number can be a good start to test with.

map hadoop reduce

Probably, you should also look at reducer lazy loading, which allows reducers to start later when required, so basically, number of maps slots can be increased. Don't have much idea on this though but, seems useful.

CodeHunter

How many mappers/reducers should be set when configuring Hadoop cluster?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last