Read from a hive table and write back to it using spark sql
You should first save your DataFrame y
in a temporary table
y.write.mode("overwrite").saveAsTable("temp_table")
Then you can overwrite rows in your target table
val dy = sqlContext.table("temp_table")dy.write.mode("overwrite").insertInto("some_table")
Actually you can also use checkpointing to achieve this. Since it breaks data lineage, Spark is not able to detect that you are reading and overwriting in the same table:
sqlContext.sparkContext.setCheckpointDir(checkpointDir) val ds = sqlContext.sql("select * from some_table").checkpoint() ds.write.mode("overwrite").saveAsTable("some_table")
You should first save your DataFrame y
like a parquet file:
y.write.parquet("temp_table")
After you load this like:
val parquetFile = sqlContext.read.parquet("temp_table")
And finish you insert your data in your table
parquetFile.write.insertInto("some_table")