using a reducer slows the mapper using a reducer slows the mapper hadoop hadoop

using a reducer slows the mapper


When you set the number of reducers to 0, you are doing a map only job. This means that the data won't be sorted nor shuffled and the output of the mappers will be written directly to disk. However, if you use reducers, then you have two cases: when you only need to sort the data, and when you also need to perform some aggregation or some operations with the data.

If you only need to sort the data, you can go with the identity reducer, which will sort the data, perform the shuffle, feed it to the reducers and then writing it to disk. In the second case, the reducers take extra time to perform the operations you wish to do, wether it's aggregation or any other thing.

So yes, there is a big difference in time when doing a map only job, and when also writing a reduce phase. Consider the following picture, all the steps you don't have to go through if after the map you write it directly to disk:

map reduce phases

EDIT: when adding a reduce phase, you see that the mappers reach 100% but don't appear as completed because there is some presorting being done during the map phase for efficiency reasons, also making some buffering writes in memory. Therefore, when you wrote your job as map only, this was not done and it completed much faster. However, now that you also use a reducer, once it reaches 100% of the mapper, it starts with the presorting and buffering in memory, and it does not appear as "Completed" until this is done.

map side

Hope it is more clear now!