pyspark error: AttributeError: 'SparkSession' object has no attribute 'parallelize' pyspark error: AttributeError: 'SparkSession' object has no attribute 'parallelize' hadoop hadoop

pyspark error: AttributeError: 'SparkSession' object has no attribute 'parallelize'

SparkSession is not a replacement for a SparkContext but an equivalent of the SQLContext. Just use it use the same way as you used to use SQLContext:


and if you ever have to access SparkContext use sparkContext attribute:


so if you need SQLContext for backwards compatibility you can:

SQLContext(sparkContext=spark.sparkContext, sparkSession=spark)

Whenever we are trying to create a DF from a backward-compatible object like RDD or a data frame created by spark session, you need to make your SQL context-aware about your session and context.

Like Ex:

If I create a RDD:


But if we wish to create a df from this RDD, we need to

sq=SQLContext(sparkContext=ss.sparkContext, sparkSession=ss)

then only we can use SQLContext with RDD/DF created by pandas.

schema = StructType([   StructField("name", StringType(), True),   StructField("age", IntegerType(), True)])df=sq.createDataFrame(rdd,schema)df.collect()