Python spark Dataframe to Elasticsearch

elasticsearch apache-spark pyspark elasticsearch-hadoop

For starters saveAsNewAPIHadoopFile expects a RDD of (key, value) pairs and in your case this may happen only accidentally. The same thing applies to the value format you declare.

I am not familiar with Elastic but just based on the arguments you should probably try something similar to this:

kpi1.rdd.map(lambda row: (None, row.asDict()).saveAsNewAPIHadoopFile(...)

Since Elastic-Hadoop provide SQL Data Source you should be also able to skip that and save data directly:

df.write.format("org.elasticsearch.spark.sql").save(...)

elasticsearch apache-spark pyspark elasticsearch-hadoop

As zero323 said, the easiest way to load a Dataframe from PySpark to Elasticsearch is with the method

Dataframe.write.format("org.elasticsearch.spark.sql").save("index/type")

elasticsearch apache-spark pyspark elasticsearch-hadoop

You can use something like this:

df.write.mode('overwrite').format("org.elasticsearch.spark.sql").option("es.resource", '%s/%s' % (conf['index'], conf['doc_type'])).option("es.nodes", conf['host']).option("es.port", conf['port']).save()

CodeHunter

Python spark Dataframe to Elasticsearch

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last