Specifying compression codec for a INSERT OVERWRITE SELECT in Hive Specifying compression codec for a INSERT OVERWRITE SELECT in Hive hadoop hadoop

Specifying compression codec for a INSERT OVERWRITE SELECT in Hive


Before the INSERT OVERWRITE prepend with the following runtime configuration values:

SET hive.exec.compress.output=true; SET io.seqfile.compression.type=BLOCK;SET mapred.output.compression.codec = com.hadoop.compression.lzo.LzopCodec;

Also make sure you have the desired compression codec by checking:

io.compression.codecs

Further information about io.seqfile.compression.type can be found here http://wiki.apache.org/hadoop/Hive/CompressedStorage

I maybe mistaken, but it seemed like BLOCK type would ensure larger files compressed at a higher ratio vs. a smaller set of lower compressed files.