'PipelinedRDD' object has no attribute 'toDF' in PySpark 'PipelinedRDD' object has no attribute 'toDF' in PySpark python python

'PipelinedRDD' object has no attribute 'toDF' in PySpark


toDF method is a monkey patch executed inside SparkSession (SQLContext constructor in 1.x) constructor so to be able to use it you have to create a SQLContext (or SparkSession) first:

# SQLContext or HiveContext in Spark 1.xfrom pyspark.sql import SparkSessionfrom pyspark import SparkContextsc = SparkContext()rdd = sc.parallelize([("a", 1)])hasattr(rdd, "toDF")## Falsespark = SparkSession(sc)hasattr(rdd, "toDF")## Truerdd.toDF().show()## +---+---+## | _1| _2|## +---+---+## |  a|  1|## +---+---+

Not to mention you need a SQLContext or SparkSession to work with DataFrames in the first place.


Make sure you have spark session too.

sc = SparkContext("local", "first app")spark = SparkSession(sc)