Save Spark dataframe as dynamic partitioned table in Hive

hadoop apache-spark hive apache-spark-sql spark-dataframe

I believe it works something like this:

df is a dataframe with year, month and other columns

df.write.partitionBy('year', 'month').saveAsTable(...)

df.write.partitionBy('year', 'month').insertInto(...)

hadoop apache-spark hive apache-spark-sql spark-dataframe

I was able to write to partitioned hive table using df.write().mode(SaveMode.Append).partitionBy("colname").saveAsTable("Table")

I had to enable the following properties to make it work.

hiveContext.setConf("hive.exec.dynamic.partition", "true")hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")

hadoop apache-spark hive apache-spark-sql spark-dataframe

I also faced same thing but using following tricks I resolved.

When we Do any table as partitioned then partitioned column become case sensitive.

Partitioned column should be present in DataFrame with same name (case sensitive). Code:

var dbName="your database name"var finaltable="your table name"// First check if table is available or not..if (sparkSession.sql("show tables in " + dbName).filter("tableName='" +finaltable + "'").collect().length == 0) {     //If table is not available then it will create for you..     println("Table Not Present \n  Creating table " + finaltable)     sparkSession.sql("use Database_Name")     sparkSession.sql("SET hive.exec.dynamic.partition = true")     sparkSession.sql("SET hive.exec.dynamic.partition.mode = nonstrict ")     sparkSession.sql("SET hive.exec.max.dynamic.partitions.pernode = 400")     sparkSession.sql("create table " + dbName +"." + finaltable + "(EMP_ID        string,EMP_Name          string,EMP_Address               string,EMP_Salary    bigint)  PARTITIONED BY (EMP_DEP STRING)")     //Table is created now insert the DataFrame in append Mode     df.write.mode(SaveMode.Append).insertInto(empDB + "." + finaltable)}

CodeHunter

Save Spark dataframe as dynamic partitioned table in Hive

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last