Read from a hive table and write back to it using spark sql Read from a hive table and write back to it using spark sql hadoop hadoop

Read from a hive table and write back to it using spark sql


You should first save your DataFrame y in a temporary table

y.write.mode("overwrite").saveAsTable("temp_table")

Then you can overwrite rows in your target table

val dy = sqlContext.table("temp_table")dy.write.mode("overwrite").insertInto("some_table")


Actually you can also use checkpointing to achieve this. Since it breaks data lineage, Spark is not able to detect that you are reading and overwriting in the same table:

 sqlContext.sparkContext.setCheckpointDir(checkpointDir) val ds = sqlContext.sql("select * from some_table").checkpoint() ds.write.mode("overwrite").saveAsTable("some_table")


You should first save your DataFrame y like a parquet file:

y.write.parquet("temp_table")

After you load this like:

val parquetFile = sqlContext.read.parquet("temp_table")

And finish you insert your data in your table

parquetFile.write.insertInto("some_table")