Reduce number of Hadoop mappers for large number of GZ files Reduce number of Hadoop mappers for large number of GZ files hadoop hadoop

Reduce number of Hadoop mappers for large number of GZ files


I encountered the same problem. I think this will help you:http://www.ibm.com/developerworks/library/bd-hadoopcombine/

The main idea is to use the CombineInputSplit and CombineRecordReader to create CombineInputFormat. As your files are .gz, they are going to be unzipped and then read by RecordReader into records.