Hive Merge all Partitions using HIVE CONCATENATE Hive Merge all Partitions using HIVE CONCATENATE hadoop hadoop

Hive Merge all Partitions using HIVE CONCATENATE


IMPORTANT -If your table schema has evolved (e.g. new columns added) but still contains files using the old schema. CONCATENATE might silently drop those files. See Bug report. Workarounds were done to fail this operation if the table is unmanaged, and was finally fixed in Hive 3.0.0.

Now that that's out of the way, we need to do this in two steps.

First, we obtain the partitions within the table in question and write them to a text file that we can refer to later.

beeline --showHeader=false --outputformat=tsv2 --silent=true -e "SHOW PARTITIONS database.table" > found_partitions.txt

This will write the list of found partitions without a header or frame.

Next, we need to iterate through the list of partitions, swap potential partition separator (part1=some/part2=thing) with a comma, since the former isn't a legal Hive character. This does nothing if you only have a single partition structure in your table. We are also assuming that all your partitions are strings and will need to be surrounded with quotes.

#!/bin/bashfor line in `cat found_partitions.txt`; do    echo "the next partition is $line"    partition=`(echo $line | sed -e 's/\//,/g' -e "s/=/='/g" -e "s/,/',/g")`\'    beeline -e "ALTER TABLE database.table PARTITION($partition) CONCATENATE" done

Note: You might have to set some configurations for beeline to work for you. Might wanna set an alias for this.

 beeline -u "jdbc:hive2://<SERVER>:<PORT>/;serviceDiscoveryMode=<zooKeeper>;zooKeeperNamespace=<hiveserver2>;principal=<USER>;transportMode=<SOMETHING>;httpPath=<SOMETHING>"