hive compaction using insert overwrite partition

hadoop hive hdfs hql

It's interesting that you are still getting 3 files when specifying the partition when using compression so you may want to look into dynamic partitioning or ditch the partitioning and focus on the number of mappers and reducers being created by your job. If your files are small I could see how you would want them all in one file on your target, but then I would also question the need for compression on them.

The number of files created in your target is directly tied to the number of reducers or mappers. If the SQL you write needs to reduce then the number of files created will be the same as the number of reducers used in the job. This can be controlled by setting the number of reducers used in the job.

set mapred.reduce.tasks = 1;

In your example SQL there most likely wouldn't be any reducers used, so the number of files in the target is equal to the number of mappers used which is equal to the number of files in the source. It isn't as easy to control the number of output files on a map only job but there are a number of configuration settings that can be tried.

Setting to combine small input files so fewer mappers are spawned, the default is false.

set hive.hadoop.supports.splittable.combineinputformat = true;

Try setting a threshold in bytes for the input files, anything under this threshold would try to be converted to a map join which can affect the number of output files.

set hive.mapjoin.smalltable.filesize = 25000000;

As for the compression I would play with changing the type of compression being used just to see if that makes any difference in your output.

set hive.exec.orc.default.compress = gzip, snappy, etc...

CodeHunter

hive compaction using insert overwrite partition

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last