Spark DataFrame groupBy and sort in the descending order (pyspark) Spark DataFrame groupBy and sort in the descending order (pyspark) python python

Spark DataFrame groupBy and sort in the descending order (pyspark)


In PySpark 1.3 sort method doesn't take ascending parameter. You can use desc method instead:

from pyspark.sql.functions import col(group_by_dataframe    .count()    .filter("`count` >= 10")    .sort(col("count").desc()))

or desc function:

from pyspark.sql.functions import desc(group_by_dataframe    .count()    .filter("`count` >= 10")    .sort(desc("count"))

Both methods can be used with with Spark >= 1.3 (including Spark 2.x).


Use orderBy:

df.orderBy('column_name', ascending=False)

Complete answer:

group_by_dataframe.count().filter("`count` >= 10").orderBy('count', ascending=False)

http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html


By far the most convenient way is using this:

df.orderBy(df.column_name.desc())

Doesn't require special imports.