Spark dataframe: Pivot and Group based on columns Spark dataframe: Pivot and Group based on columns hadoop hadoop

Spark dataframe: Pivot and Group based on columns


You can use collect_list if you can bear with an empty List at cells where it should be zero:

df.groupBy("id").pivot("app").agg(collect_list("customer")).show+---+--------+----+--------+| id|      bc|  fe|      fw|+---+--------+----+--------+|id3|[TR, WM]|  []|      []||id1|      []|[WM]|[CS, WM]||id2|      []|  []|    [CS]|+---+--------+----+--------+


Using CONCAT_WS we can explode array and can remove the square brackets.

df.groupBy("id").pivot("app").agg(concat_ws(",",collect_list("customer")))