'PipelinedRDD' object has no attribute 'toDF' in PySpark

python apache-spark pyspark apache-spark-sql rdd

toDF method is a monkey patch executed inside SparkSession (SQLContext constructor in 1.x) constructor so to be able to use it you have to create a SQLContext (or SparkSession) first:

# SQLContext or HiveContext in Spark 1.xfrom pyspark.sql import SparkSessionfrom pyspark import SparkContextsc = SparkContext()rdd = sc.parallelize([("a", 1)])hasattr(rdd, "toDF")## Falsespark = SparkSession(sc)hasattr(rdd, "toDF")## Truerdd.toDF().show()## +---+---+## | _1| _2|## +---+---+## |  a|  1|## +---+---+

Not to mention you need a SQLContext or SparkSession to work with DataFrames in the first place.

python apache-spark pyspark apache-spark-sql rdd

Make sure you have spark session too.

sc = SparkContext("local", "first app")spark = SparkSession(sc)

CodeHunter

'PipelinedRDD' object has no attribute 'toDF' in PySpark

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last