Hive load CSV with commas in quoted fields Hive load CSV with commas in quoted fields hadoop hadoop

Hive load CSV with commas in quoted fields


If you can re-create or parse your input data, you can specify an escape character for the CREATE TABLE:

ROW FORMAT DELIMITED FIELDS TERMINATED BY "," ESCAPED BY '\\';

Will accept this line as 4 fields

1,some text\, with comma in it,123,more text


The problem is that Hive doesn't handle quoted texts. You either need to pre-process the data by changing the delimiter between the fields (e.g: with a Hadoop-streaming job) or you can also give a try to use a custom CSV SerDe which uses OpenCSV to parse the files.


As of Hive 0.14, the CSV SerDe is a standard part of the Hive install

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

(See: https://cwiki.apache.org/confluence/display/Hive/CSV+Serde)