Explode array values using PySpark Explode array values using PySpark hadoop hadoop

Explode array values using PySpark


Use explode and then split the struct fileds, finally drop the newly exploded and transactions array columns.

Example:

from pyspark.sql.functions import *#got only some columns from jsondf.printSchema()#root# |-- account_balance: long (nullable = true)# |-- transactions: array (nullable = true)# |    |-- element: struct (containsNull = true)# |    |    |-- amount: long (nullable = true)# |    |    |-- date: string (nullable = true)df.selectExpr("*","explode(transactions)").select("*","col.*").drop(*['col','transactions']).show()#+---------------+------+--------+#|account_balance|amount|    date|#+---------------+------+--------+#|             10|  1000|20200202|#+---------------+------+--------+