Reduce number of Hadoop mappers for large number of GZ files
I encountered the same problem. I think this will help you:http://www.ibm.com/developerworks/library/bd-hadoopcombine/
The main idea is to use the CombineInputSplit and CombineRecordReader to create CombineInputFormat. As your files are .gz, they are going to be unzipped and then read by RecordReader into records.