Databricks - Pyspark - Handling nested json with a dynamic key

The idea:

Step 1: Extract Header and TimeSeries separately.
Step 2: For each field in the TimeSeries object, extract the Amount and UnitPrice, together with the name of the field, stuff them into a struct.
Step 3: Merge all these structs into an array column, and explode it.
Step 4: Extract Timeseries, Amount and UnitPrice from the exploded column.
Step 5: Cross join with the Header row.

import pyspark.sql.functions as Fheader_df = df.select("Header.*")timeseries_df = df.select("TimeSeries.*")fieldNames = enumerate(timeseries_df.schema.fieldNames())cols = [F.struct(F.lit(name).alias("Timeseries"), col(name).getItem("Amount").alias("Amount"), col(name).getItem("UnitPrice").alias("UnitPrice")).alias("ts_" + str(idx)) for idx, name in fieldNames]combined = explode(array(cols)).alias("comb")timeseries = timeseries_df.select(combined).select('comb.Timeseries', 'comb.Amount', 'comb.UnitPrice')result = header_df.crossJoin(timeseries)result.show(truncate = False)

Output:

+-----+-----+-----+-----+-------------------------+------+---------+|Code1|Code2|Code3|Code4|Timeseries               |Amount|UnitPrice|+-----+-----+-----+-----+-------------------------+------+---------+|abc  |def  |ghi  |jkl  |2020-11-25T03:00:00+00:00|10000 |1000     ||abc  |def  |ghi  |jkl  |2020-11-26T03:00:00+00:00|10000 |1000     |+-----+-----+-----+-----+-------------------------+------+---------+

CodeHunter

Databricks - Pyspark - Handling nested json with a dynamic key

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last