Explode array values using PySpark

apache-spark hadoop pyspark apache-spark-sql pyspark-dataframes

Use explode and then split the struct fileds, finally drop the newly exploded and transactions array columns.

Example:

from pyspark.sql.functions import *#got only some columns from jsondf.printSchema()#root# |-- account_balance: long (nullable = true)# |-- transactions: array (nullable = true)# |    |-- element: struct (containsNull = true)# |    |    |-- amount: long (nullable = true)# |    |    |-- date: string (nullable = true)df.selectExpr("*","explode(transactions)").select("*","col.*").drop(*['col','transactions']).show()#+---------------+------+--------+#|account_balance|amount|    date|#+---------------+------+--------+#|             10|  1000|20200202|#+---------------+------+--------+

CodeHunter

Explode array values using PySpark

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last