Apache Parquet Could not read footer: java.io.IOException:

java hadoop io apache-spark parquet

I got the same problem trying to read a parquet file from S3. In my case the issue was the required libraries were not available for all workers in the cluster.

There are 2 ways to fix that:

Make sure you added the dependencies on the spark-submit command so it's distributed to the whole cluster
Add the dependencies on the /jars directory on your SPARK_HOME for each worker in the cluster.

java hadoop io apache-spark parquet

if you open a parquet file (text editor), at the very bottom you will see something like "parquet-mr" and that could help you know what version/format the file was created from

the method above though simple, the "creator" can be something else like impala or other component that can create parquet files and you can use parquet-tools https://github.com/apache/parquet-mr/tree/master/parquet-tools

since it looks like you are using spark to read the parquet file you might be able to work-around it by setting spark.sql.parquet.filterPushdown to false. maybe try that first (more info here - https://spark.apache.org/docs/latest/sql-programming-guide.html#configuration - change latest to your version of spark).

if that does not work then maybe try if this is an issue with latest version of spark - if it does then you can try to trace history of which commits fixed it and that might give you an insight on possible work-around

or if you know the parquet version you can use (switch) the corresponding branch of parquet-mr (build the parquet-tools for that) and use the tools for that version to test your metadata files (_metadata, _common_metadata) or one of the parquet file - you should be able to reproduce the error and debug from there

java hadoop io apache-spark parquet

Check your folder permissions. We have seen this error in other environments and it was caused by Spark not having the permission to access the file.

CodeHunter

Apache Parquet Could not read footer: java.io.IOException:

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last