How to convert a JSON file to parquet using Apache Spark? How to convert a JSON file to parquet using Apache Spark? json json

How to convert a JSON file to parquet using Apache Spark?


Spark 1.4 and later

You can use sparkSQL to read first the JSON file into an DataFrame, then writing the DataFrame as parquet file.

val df = sqlContext.read.json("path/to/json/file")df.write.parquet("path/to/parquet/file")

or

df.save("path/to/parquet/file", "parquet")

Check here and here for examples and more details.

Spark 1.3.1

val df = sqlContext.jsonFile("path/to/json/file")df.saveAsParquetFile("path/to/parquet/file")

Issue related to Windows and Spark 1.3.1

Saving a DataFrame as a parquet file on Windows will throw a java.lang.NullPointerException, as described here.

In that case, please consider to upgrade to a more recent Spark version.