Error loading CSV to table
References:
Conform to Apache's DDL syntax for Create/Drop/Truncate Table, and, more specifically, to Row Formats & SerDe, in order to change the FIELDS/LINES/etc
properties, you can use:
- the native SerDe (if
ROW FORMAT
is not specified, orROW FORMAT DELIMITED
is set), or - a custom SerDe (by applying the
SERDE
clause).
Option 1: Using the native Serde
CREATE TABLE survey(country string, age string, industryType string, companyType string, occupation string, salary string, project string)ROW FORMAT DELIMITED FIELDS TERMINATED BY "," ESCAPED BY "\\" LINES TERMINATED BY "\n"STORED AS TEXTFILE;
Notice the lack of a clause which could specify the character to quote the field values.
Option 2: Using the custom Serde
See the record named "CSV/TSV" in the table from Row Formats & SerDe.
CREATE TABLE survey(country string, age string, industryType string, companyType string, occupation string, salary string, project string)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"", "escapeChar" = "\\") STORED AS TEXTFILE;
I suggest you to make a test with a CSV file containing one line of values. Inside the line enclose all column values into double quotes and see what happens. Beeing enclosed, the "," character (like the one in salary
field) will/should be automatically escaped and, therefore, saved as part of the corresponding column value.
Good luck.