dataframe Spark scala explode json array dataframe Spark scala explode json array json json

dataframe Spark scala explode json array


You'll have to parse the JSON string into an array of JSONs, and then use explode on the result (explode expects an array).

To do that (assuming Spark 2.0.*):

  • If you know all Payment values contain a json representing an array with the same size (e.g. 2 in this case), you can hard-code extraction of the first and second elements, wrap them in an array and explode:

    val newDF = dataframe.withColumn("Payment", explode(array(  get_json_object($"Payment", "$[0]"),  get_json_object($"Payment", "$[1]"))))
  • If you can't guarantee all records have a JSON with a 2-element array, but you can guarantee a maximum length of these arrays, you can use this trick to parse elements up to the maximum size and then filter out the resulting nulls:

    val maxJsonParts = 3 // whatever that number is...val jsonElements = (0 until maxJsonParts)                     .map(i => get_json_object($"Payment", s"$$[$i]"))val newDF = dataframe  .withColumn("Payment", explode(array(jsonElements: _*)))  .where(!isnull($"Payment")) 


import org.apache.spark.sql.types._val newDF = dataframe.withColumn("Payment", explode(from_json(  get_json_object($"Payment", "$."),ArrayType(StringType))))


You can define the schema of the Payment json array using ArrayType.

import org.apache.spark.sql.types._val paymentSchema = ArrayType(StructType(                  Array(                        StructField("@id", DataTypes.IntegerType),                        StructField("currency", DataTypes.StringType)                  )))

Then exploding after using from_json with this schema will return the desired result.

val newDF = dataframe.withColumn("Payment", explode(from_json($"Payment", paymentSchema)))