In-depth understanding of internal working of map phase in a Map reduce job in hadoop?

Map-only jobs work differently than Map-and-Reduce jobs. It's not inconsistent, just different.

how will I be able to get the intermediate file that has both partitions and sorted data from the mapper output.

You can't. There isn't a hook to be able to get pieces of data from intermediate stages of MapReduce. Same is true for getting data after the partitioner, or after a record reader, etc.

It is interesting to note that running mapper alone does not produce sorted output contradicting the points that data send to reducer is not sorted. More details here

It does not contradict. Mappers sort because the reducer needs it sorted to be able to do a merge. If there are no reducers, it has no reason to to sort, so it doesn't. This is the right behavior because I don't want it sorted in a map only job which would make my processing slower. I've never had a situation where I wanted my map output to be locally sorted.

Even no combiner is applied if No only Mapper is run: More details here

Combiners are an optimization. There is no guarantee that they actually run or over what data. Combiners are mostly there to make the reducers more efficient. So, again, just like the local sorting, combiners do not run if there are no reducers because it has no reason to.

If you want combiner-like behavior, I suggest writing data into a buffer (hashmap perhaps) and then writing out locally-summarized data in the cleanup function that runs when a Mapper finishes. Be careful of memory usage if you want to do this. This is a better approach because combiners are specified as a good-to-have optimization and you can't count on them running... even when they do run.

CodeHunter

In-depth understanding of internal working of map phase in a Map reduce job in hadoop?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last