overwrite hive partitions using spark

scala amazon-web-services apache-spark hadoop hive

If you are on Spark 2.3.0, try setting spark.sql.sources.partitionOverwriteMode setting to dynamic, the dataset needs to be partitioned, and the write mode overwrite.

spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")data.write.mode("overwrite").insertInto("partitioned_table")

scala amazon-web-services apache-spark hadoop hive

I would suggest to run sql using sparksession. you can run " insert overwrite partition query" by selecting the columns from existing dataset. this solution will surely overwrites partition only.

scala amazon-web-services apache-spark hadoop hive

So, if you are using Spark version < 2.3 and want to write into partitions dynamically without deleting the others, you can implement the below solution.

The idea is to register the dataset as a table and then use spark.sql() to run the INSERT query.

// Create SparkSession with Hive dynamic partitioning enabledval spark: SparkSession =    SparkSession        .builder()        .appName("StatsAnalyzer")        .enableHiveSupport()        .config("hive.exec.dynamic.partition", "true")        .config("hive.exec.dynamic.partition.mode", "nonstrict")        .getOrCreate()// Register the dataframe as a Hive tableimpressionsDF.createOrReplaceTempView("impressions_dataframe")// Create the output Hive tablespark.sql(    s"""      |CREATE EXTERNAL TABLE stats (      |   ad            STRING,      |   impressions   INT,      |   clicks        INT      |) PARTITIONED BY (country STRING, year INT, month INT, day INT)      |ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'    """.stripMargin)// Write the data into disk as Hive partitionsspark.sql(    s"""      |INSERT OVERWRITE TABLE stats       |PARTITION(country = 'US', year = 2017, month = 3, day)      |SELECT ad, SUM(impressions), SUM(clicks), day      |FROM impressions_dataframe      |GROUP BY ad    """.stripMargin)

CodeHunter

overwrite hive partitions using spark

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last