Elegant Json flatten in Spark [duplicate]
If you're not looking for a recursive solution then in 1.6+ dot syntax with star should work just fine:
val df = sqlContext.read.json(sc.parallelize(Seq( """{"properties": { "prop1": "foo", "prop2": "bar", "prop3": true, "prop4": 1}}""")))df.select($"properties.*").printSchema// root// |-- prop1: string (nullable = true)// |-- prop2: string (nullable = true)// |-- prop3: boolean (nullable = true)// |-- prop4: long (nullable = true)
Unfortunately this doesn't work in 1.5 and before.
In case like this you can simply extract required information directly from the schema. You'll find one example in Dropping a nested column from Spark DataFrame which should be easy to adjust to fit this scenario and another one (recursive schema flattening in Python) Pyspark: Map a SchemaRDD into a SchemaRDD.