AttributeError: 'DataFrame' object has no attribute 'map'

python apache-spark pyspark spark-dataframe apache-spark-mllib

You can't map a dataframe, but you can convert the dataframe to an RDD and map that by doing spark_df.rdd.map(). Prior to Spark 2.0, spark_df.map would alias to spark_df.rdd.map(). With Spark 2.0, you must explicitly call .rdd first.

python apache-spark pyspark spark-dataframe apache-spark-mllib

You can use df.rdd.map(), as DataFrame does not have map or flatMap, but be aware of the implications of using df.rdd:

Converting to RDD breaks Dataframe lineage, there is no predicate pushdown, no column prunning, no SQL plan and less efficient PySpark transformations.

What should you do instead?

Keep in mind that the high-level DataFrame API is equipped with many alternatives. First, you can use select or selectExpr.

Another example is using explode instead of flatMap(which existed in RDD):

df.select($"name",explode($"knownLanguages"))    .show(false)

Result:

+-------+------+|name   |col   |+-------+------+|James  |Java  ||James  |Scala ||Michael|Spark ||Michael|Java  ||Michael|null  ||Robert |CSharp||Robert |      |+-------+------+

You can also use withColumn or UDF, depending on the use-case, or another option in the DataFrame API.

CodeHunter

AttributeError: 'DataFrame' object has no attribute 'map'

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last