hive-drop-import-delims not removing newline while using HCatalog in Sqoop hive-drop-import-delims not removing newline while using HCatalog in Sqoop hadoop hadoop

hive-drop-import-delims not removing newline while using HCatalog in Sqoop


Use --map-column-java option to explicitly state the column is of type String. Then --hive-drop-import-delims works as expected (to remove \n from data).

Changed Sqoop Command :

sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \--username user123 --password passwd123 -table SCHEMA.TBL_2 \ --hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \ --split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \--null-string "" --map-column-java col4=String


sqoop import \--connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \--username 123 \--password 123 \--table SCHEMA.TBL_2 \--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \--split-by SOME_ID --columns col1,col2,col3,col4 \--hive-delims-replacement "anything" \--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \--null-string ""

You can try this --hive-delims-replacement "anything" this will replace all \n , \t , and \01 characters withthe string you provided(in this case replace with string "anything").


From the official website:https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

Hive will have problems using Sqoop-imported data if your database’s rows contain string fields that have Hive’s default row delimiters (\n and \r characters) or column delimiters (\01 characters) present in them. You can use the --hive-drop-import-delims option to drop those characters on import to give Hive-compatible text data. Alternatively, you can use the --hive-delims-replacement option to replace those characters with a user-defined string on import to give Hive-compatible text data. These options should only be used if you use Hive’s default delimiters and should not be used if different delimiters are specified.