How to add partition using hive by a specific date? How to add partition using hive by a specific date? hadoop hadoop

How to add partition using hive by a specific date?


First start with the right table definition. In your case I'll just use what you wrote:

CREATE EXTERNAL TABLE test (    foo string,    time string,    bar string)  PARTITIONED BY (dt string)ROW FORMAT DELIMITEDFIELDS TERMINATED BY '\t'LOCATION 's3://test.com/';

Hive by default expects partitions to be in subdirectories named via the convention s3://test.com/partitionkey=partitionvalue. For example

s3://test.com/dt=2014-03-05

If you follow this convention you can use MSCK to add all partitions.

If you can't or don't want to use this naming convention, you will need to add all partitions as in:

ALTER TABLE test    ADD PARTITION (dt='2014-03-05')    location 's3://test.com/2014-03-05'


If you have existing directory structure that doesn't comply <partition name>=<partition value>, you have to add partitions manually. MSCK REPAIR TABLE won't work unless you structure your directory like so.

After you specify location on table creation like:

CREATE EXTERNAL TABLE test (    foo string,    time string,    bar string)  PARTITIONED BY (dt string)ROW FORMAT DELIMITEDFIELDS TERMINATED BY '\t'LOCATION 's3://test.com/';

You can add partition without specifying full path:

ALTER TABLE test ADD PARTITION (dt='2014-03-05') LOCATION '2014-03-05';

Although I've never checked it, I suggest you to move your partitions into a folder inside the bucket, not directly in the bucket itself. E.g. from s3://test.com/ to s3://test.com/data/.


If you are going to partition using date field you need s3 folder structure as mentioned below:

s3://test.com/date=2014-03-05/ip-foo-request-2014-03-05_04-20_00-49.log

In such case you can create external table with partition column as date and run MSCK REPAIR TABLE EXTERNAL_TABLE_NAME to update hive meta store.