mapreduce.reduce.shuffle.memory.limit.percent, mapreduce.reduce.shuffle.input.buffer.percent and mapreduce.reduce.shuffle.merge.percent

I would like to share my understanding on these properties, hope it would help. Advise me if anything is wrong.

mapreduce.reduce.shuffle.input.buffer.percent tells about the percentage of the reducer's heap memory to be allocated for the circular buffer to store the intermediate outputs copied from multiple mappers.

mapreduce.reduce.shuffle.memory.limit.percent tells about the maximum percentage of the above memory buffer that a single shuffle (output copied from single Map task) should take. The shuffle's size above this size will not be copied to the memory buffer, instead they will be directly written to the disk of the reducer.

mapreduce.reduce.shuffle.merge.percent tells about the threshold percentage by where the in-memory merger thread will run to merge the available shuffle contents on the memory buffer into a single file and immediately spills the merged file into the disk.

It is obvious that in-memory merger thread should require at least 2 shuffle files to be present in memory buffer to initiate merge. So at any time the mapreduce.reduce.shuffle.merge.percent should be higher than any single shuffle file in memory controlled by mapreduce.reduce.shuffle.memory.limit.percent property, by which it mandates that there should be at least more than one shuffle file to be present in the buffer for the merge process.

CodeHunter

mapreduce.reduce.shuffle.memory.limit.percent, mapreduce.reduce.shuffle.input.buffer.percent and mapreduce.reduce.shuffle.merge.percent

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last