Spark Dataframe upsert to Elasticsearch
The reason why mode("Overwrite")
was a problem is that when you overwrite your entire dataframe it deletes all data that matches with your rows of dataframe at once and it looks like the entire index is empty for me and I figure out how to actually upsert
it
here is my code
df.write .format("org.elasticsearch.spark.sql") .option("es.nodes.wan.only","true") .option("es.nodes.discovery", "false") .option("es.nodes.client.only", "false") .option("es.net.ssl","true") .option("es.mapping.id", index) .option("es.write.operation", "upsert") .option("es.nodes", esURL) .option("es.port", "443") .mode("append") .save(path)
Note that you have to put "es.write.operation", "upert"
and .mode("append")
Try setting:
es.write.operation = upsert
This should perform the required operation. You can find more details in https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html