get TopN of all groups after group by using Spark DataFrame get TopN of all groups after group by using Spark DataFrame sql sql

get TopN of all groups after group by using Spark DataFrame


You can use rank window function as follows

import org.apache.spark.sql.expressions.Windowimport org.apache.spark.sql.functions.{rank, desc}val n: Int = ???// Window definitionval w = Window.partitionBy($"user").orderBy(desc("rating"))// Filterdf.withColumn("rank", rank.over(w)).where($"rank" <= n)

If you don't care about ties then you can replace rank with row_number