Specifying compression codec for a INSERT OVERWRITE SELECT in Hive
Before the INSERT OVERWRITE prepend with the following runtime configuration values:
SET hive.exec.compress.output=true; SET io.seqfile.compression.type=BLOCK;SET mapred.output.compression.codec = com.hadoop.compression.lzo.LzopCodec;
Also make sure you have the desired compression codec by checking:
io.compression.codecs
Further information about io.seqfile.compression.type can be found here http://wiki.apache.org/hadoop/Hive/CompressedStorage
I maybe mistaken, but it seemed like BLOCK type would ensure larger files compressed at a higher ratio vs. a smaller set of lower compressed files.