Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column json json

Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column


You may try either of these two ways.

Option-1: JSON in single line as answered above by @Avishek Bhattacharya.

Option-2: Add option to read multi line JSON in the code as follows. You could read the nested attribute also as shown below.

val df = spark.read.option("multiline","true").json("C:\\data\\nested-data.json")df.select("a.b").show()

Here is the output for Option-2.

20/07/29 23:14:35 INFO DAGScheduler: Job 1 finished: show at NestedJsonReader.scala:23, took 0.181579 s+---+|  b|+---+|  1|+---+


The problem is with the JSON file. The file : "D:/playground/input.json" looks like as you descibed as

{  "a": {  "b": 1  }}

This is not right. Spark while processing json data considers each new line as a complete json. Thus it is failing.

You should keep your complete json in a single line in a compact form by removing all white spaces and newlines.

Like

{"a":{"b":1}}

If you want multiple jsons in a single file keep them like this

{"a":{"b":1}}{"a":{"b":2}}{"a":{"b":3}} ...

For more infos see


This error means 2 things:

1- either your file format isn't what you think (and you are using the wrong method for it, like its text but you mistakenly used json method)

2- you file doesn't follow the standards for the format you are using (while you used correct method for correct format), this usually happens with json.