Hive create table with inputs from nested sub-directories Hive create table with inputs from nested sub-directories hadoop hadoop

Hive create table with inputs from nested sub-directories


Use this Hive settings to enable recursive directories:

set hive.mapred.supports.subdirectories=TRUE;set mapred.input.dir.recursive=TRUE;

Create external table and specify root directory as a location:

LOCATION 'hdfs://.../data'

You will be able to query data from table location and all subdirectories


One thing that would solve your problem is adding the folder name as a partition column to the external table. Then you can create the table as you're creating just on the data directory.Or you can take these nested files and flatten them in a single directory.

I don't think you'll be able to ask hive to have input of all these folders considered as 1 table otherwise.

This questions seems to be addressing a similar issue:when creating an external table in hive can I point the location to specific files in a direcotry?

There is an open jira issue on the same context:https://issues.apache.org/jira/browse/HIVE-951

Browsing more I saw this post suggesting you use SimlinkInputTextFormat as an alternative. I am not sure how well this would fly with your Avro format.https://hive.apache.org/javadocs/r0.10.0/api/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.html