Explode array values using PySpark
Use explode
and then split the struct
fileds, finally drop the newly exploded and transactions array columns.
Example:
from pyspark.sql.functions import *#got only some columns from jsondf.printSchema()#root# |-- account_balance: long (nullable = true)# |-- transactions: array (nullable = true)# | |-- element: struct (containsNull = true)# | | |-- amount: long (nullable = true)# | | |-- date: string (nullable = true)df.selectExpr("*","explode(transactions)").select("*","col.*").drop(*['col','transactions']).show()#+---------------+------+--------+#|account_balance|amount| date|#+---------------+------+--------+#| 10| 1000|20200202|#+---------------+------+--------+