How to save dataframe to Elasticsearch in PySpark?

apache-spark elasticsearch pyspark apache-spark-sql

tl;dr Use pyspark --packages org.elasticsearch:elasticsearch-hadoop:7.2.0 and use format("es") to reference the connector.

Quoting Installation from the official documentation of the Elasticsearch for Apache Hadoop product:

Just like other libraries, elasticsearch-hadoop needs to be available in Spark’s classpath.

And later in Supported Spark SQL versions:

elasticsearch-hadoop supports both version Spark SQL 1.3-1.6 and Spark SQL 2.0 through two different jars: elasticsearch-spark-1.x-<version>.jar and elasticsearch-hadoop-<version>.jar
elasticsearch-spark-2.0-<version>.jar supports Spark SQL 2.0

That looks like an issue with the document (as they use two different versions of the jar file), but does mean that you have to use the proper jar file on the CLASSPATH of your Spark application.

And later in the same document:

Spark SQL support is available under org.elasticsearch.spark.sql package.

That simply says that the format (in df.write.format('org.elasticsearch.spark.sql')) is correct.

Further down the document you can find that you could even use an alias df.write.format("es") (!)

I found Apache Spark section in the project's repository on GitHub more readable and current.

apache-spark elasticsearch pyspark apache-spark-sql

Update: The current ES-hadoop package as of June 2020 is 7.7.1, so I used pyspark --packages org.elasticsearch:elasticsearch-hadoop:7.7.1 instead.

CodeHunter

How to save dataframe to Elasticsearch in PySpark?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last