Spark dataframe: Pivot and Group based on columns

scala hadoop apache-spark spark-dataframe

You can use collect_list if you can bear with an empty List at cells where it should be zero:

df.groupBy("id").pivot("app").agg(collect_list("customer")).show+---+--------+----+--------+| id|      bc|  fe|      fw|+---+--------+----+--------+|id3|[TR, WM]|  []|      []||id1|      []|[WM]|[CS, WM]||id2|      []|  []|    [CS]|+---+--------+----+--------+

scala hadoop apache-spark spark-dataframe

Using CONCAT_WS we can explode array and can remove the square brackets.

df.groupBy("id").pivot("app").agg(concat_ws(",",collect_list("customer")))

CodeHunter

Spark dataframe: Pivot and Group based on columns

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last