Why don't EMR instances have as many reducers as mappers?

memory hadoop amazon-web-services elastic-map-reduce reducers

Mappers extract data from their input stream (the mapper's STDIN), and what they emit is much more compact. That outbound stream (the mapper's STDOUT) is also then sorted by the key. Therefore, the reducers have smaller, sorted data in their incoming.

That is pretty much the reason why the default configuration for any Hadoop MapReduce cluster, not just EMR, is to have more mappers than reducers, proportional to the number of cores available to the jobtracker.

You have the ability to control the number of mappers and reducers through the jobconf parameter. The configuration variables are mapred.map.tasks and mapred.reduce.tasks.

CodeHunter

Why don't EMR instances have as many reducers as mappers?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last