Write data from pyspark to ElasticSearch

python amazon-web-services hadoop elasticsearch pyspark

I had the same problem.

After reading this article, I found the answer!!!

You have to convert to PythonRDD Type like this:

>>> type(df)<class 'pyspark.sql.dataframe.DataFrame'>>>> type(df.rdd)<class 'pyspark.rdd.RDD'>>>> df.rdd.saveAsNewAPIHadoopFile(...) # Got the same error message>>> df.printSchema() # My schemaroot |-- id: string (nullable = true) ...# Let's convert to PythonRDD>>> python_rdd = df.map(lambda item: ('key', {... 'id': item['id'],    ...... }))>>> python_rddPythonRDD[42] at RDD at PythonRDD.scala:43>>> python_rdd.saveAsNewAPIHadoopFile(...) # Now, success

python amazon-web-services hadoop elasticsearch pyspark

saveAsNewAPIHadoopFile is in RDD ,

http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD

I guess this line should be

es_df_pf.rdd.saveAsNewAPIHadoopFile

CodeHunter

Write data from pyspark to ElasticSearch

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last