get TopN of all groups after group by using Spark DataFrame
You can use rank
window function as follows
import org.apache.spark.sql.expressions.Windowimport org.apache.spark.sql.functions.{rank, desc}val n: Int = ???// Window definitionval w = Window.partitionBy($"user").orderBy(desc("rating"))// Filterdf.withColumn("rank", rank.over(w)).where($"rank" <= n)
If you don't care about ties then you can replace rank
with row_number