Save Spark dataframe as dynamic partitioned table in Hive Save Spark dataframe as dynamic partitioned table in Hive hadoop hadoop

Save Spark dataframe as dynamic partitioned table in Hive


I believe it works something like this:

df is a dataframe with year, month and other columns

df.write.partitionBy('year', 'month').saveAsTable(...)

or

df.write.partitionBy('year', 'month').insertInto(...)


I was able to write to partitioned hive table using df.write().mode(SaveMode.Append).partitionBy("colname").saveAsTable("Table")

I had to enable the following properties to make it work.

hiveContext.setConf("hive.exec.dynamic.partition", "true")hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")


I also faced same thing but using following tricks I resolved.

  1. When we Do any table as partitioned then partitioned column become case sensitive.

  2. Partitioned column should be present in DataFrame with same name (case sensitive). Code:

    var dbName="your database name"var finaltable="your table name"// First check if table is available or not..if (sparkSession.sql("show tables in " + dbName).filter("tableName='" +finaltable + "'").collect().length == 0) {     //If table is not available then it will create for you..     println("Table Not Present \n  Creating table " + finaltable)     sparkSession.sql("use Database_Name")     sparkSession.sql("SET hive.exec.dynamic.partition = true")     sparkSession.sql("SET hive.exec.dynamic.partition.mode = nonstrict ")     sparkSession.sql("SET hive.exec.max.dynamic.partitions.pernode = 400")     sparkSession.sql("create table " + dbName +"." + finaltable + "(EMP_ID        string,EMP_Name          string,EMP_Address               string,EMP_Salary    bigint)  PARTITIONED BY (EMP_DEP STRING)")     //Table is created now insert the DataFrame in append Mode     df.write.mode(SaveMode.Append).insertInto(empDB + "." + finaltable)}