dataframe Spark scala explode json array
You'll have to parse the JSON string into an array of JSONs, and then use explode
on the result (explode expects an array).
To do that (assuming Spark 2.0.*):
If you know all
Payment
values contain a json representing an array with the same size (e.g. 2 in this case), you can hard-code extraction of the first and second elements, wrap them in an array and explode:val newDF = dataframe.withColumn("Payment", explode(array( get_json_object($"Payment", "$[0]"), get_json_object($"Payment", "$[1]"))))
If you can't guarantee all records have a JSON with a 2-element array, but you can guarantee a maximum length of these arrays, you can use this trick to parse elements up to the maximum size and then filter out the resulting
null
s:val maxJsonParts = 3 // whatever that number is...val jsonElements = (0 until maxJsonParts) .map(i => get_json_object($"Payment", s"$$[$i]"))val newDF = dataframe .withColumn("Payment", explode(array(jsonElements: _*))) .where(!isnull($"Payment"))
import org.apache.spark.sql.types._val newDF = dataframe.withColumn("Payment", explode(from_json( get_json_object($"Payment", "$."),ArrayType(StringType))))
You can define the schema of the Payment json array using ArrayType.
import org.apache.spark.sql.types._val paymentSchema = ArrayType(StructType( Array( StructField("@id", DataTypes.IntegerType), StructField("currency", DataTypes.StringType) )))
Then exploding after using from_json with this schema will return the desired result.
val newDF = dataframe.withColumn("Payment", explode(from_json($"Payment", paymentSchema)))