Elegant Json flatten in Spark [duplicate]

json scala apache-spark apache-spark-sql

If you're not looking for a recursive solution then in 1.6+ dot syntax with star should work just fine:

val df = sqlContext.read.json(sc.parallelize(Seq(  """{"properties": {       "prop1": "foo", "prop2": "bar", "prop3": true, "prop4": 1}}""")))df.select($"properties.*").printSchema// root//  |-- prop1: string (nullable = true)//  |-- prop2: string (nullable = true)//  |-- prop3: boolean (nullable = true)//  |-- prop4: long (nullable = true)

Unfortunately this doesn't work in 1.5 and before.

In case like this you can simply extract required information directly from the schema. You'll find one example in Dropping a nested column from Spark DataFrame which should be easy to adjust to fit this scenario and another one (recursive schema flattening in Python) Pyspark: Map a SchemaRDD into a SchemaRDD.

CodeHunter

Elegant Json flatten in Spark [duplicate]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last