Reading Json file using Apache Spark

java json hadoop apache-spark apache-spark-2.0

You will need to format the JSON to one line if you have to read this JSON. This is a multi line JSON and hence is not being read and loaded properly (One Object one Row)

Quoting the JSON API :

Loads a JSON file (one object per line) and returns the result as a DataFrame.

{"glossary":{"title":"example glossary","GlossDiv":{"title":"S","GlossList":{"GlossEntry":{"ID":"SGML","SortAs":"SGML","GlossTerm":"Standard Generalized Markup Language","Acronym":"SGML","Abbrev":"ISO 8879:1986","GlossDef":{"para":"A meta-markup language, used to create markup languages such as DocBook.","GlossSeeAlso":["GML","XML"]},"GlossSee":"markup"}}}}}

I just tried it on the shell , it should work from the code as well the same way (I had the same corrupted record error when i read a multi line JSON)

scala> val df = spark.read.json("C:/DevelopmentTools/data.json")df: org.apache.spark.sql.DataFrame = [glossary: struct<GlossDiv: struct<GlossList: struct<GlossEntry: struct<Abbrev: string, Acronym: string ... 5 more fields>>, title: string>, title: string>]scala>

Edits :

You can get the values out from that data frame using any action , for example

scala> df.select(df("glossary.GlossDiv.GlossList.GlossEntry.GlossTerm")).show()+--------------------+|           GlossTerm|+--------------------+|Standard Generali...|+--------------------+scala>

You should be able to do it from your code as well

java json hadoop apache-spark apache-spark-2.0

Just make sure your json is in one line you are reading nested json so, if you already did this you are successfully loaded the json you are showing it in wrong way its nested json so you cant directly show, like if you want the title data of GlossDiv you can show it as follow

SparkSession session = SparkSession.builder().master("local").appName("jsonreader").getOrCreate();Dataset<Row> list = session.read().json("/Users/hadoop/Desktop/sample.json");list.select("glossary.GlossDiv.title") .show

java json hadoop apache-spark apache-spark-2.0

Try:

session.read().json(session.sparkContext.wholeTextFiles("..."));

CodeHunter

Reading Json file using Apache Spark

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last