LOAD DATA INPATH loads same CSV-base data into two different and external Hive tables LOAD DATA INPATH loads same CSV-base data into two different and external Hive tables hadoop hadoop

LOAD DATA INPATH loads same CSV-base data into two different and external Hive tables


It looks like you just need to specify a different 'LOCATION' for the second table. When you do the 'LOAD DATA', Hive is actually copying data into that path. If both tables have the same 'LOCATION', they will share the same data.


Your location is what creating problem. You have given same location for both the tables. As the tables are external the file will be created directly under your path.

Also LOAD DATA INPATH '/file/file1.csv' OVERWRITE INTO TABLE hive_table1; will overwrites the already existing file. This is what happening with your tables. As Farooque mentioned for different tables the location should be unique to get the desired results.


I see you are creating external table and creating 2 tables having single files each.

You have to follow the simple steps as below:

Create table

CREATE EXTERNAL TABLE IF NOT EXISTS hive_table1(id int, age string, date string...) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' STORED AS TEXTFILE LOCATION '/user/hive/warehouse/table1_dir/'

Copy file to HDFS location

hdfs dfs -put '/file/file1.csv' '/user/hive/warehouse/table1_dir/'

Similary for second table

Create table

CREATE EXTERNAL TABLE IF NOT EXISTS hive_table2(id int, age string, date string...) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' STORED AS TEXTFILE LOCATION '/user/hive/warehouse/table2_dir/'

Copy file to HDFS location

hdfs dfs -put '/file/file2.csv' '/user/hive/warehouse/table2_dir/'

Note: If you are using more than one table, then their location should be unique.