Apache Parquet Could not read footer: java.io.IOException: Apache Parquet Could not read footer: java.io.IOException: hadoop hadoop

Apache Parquet Could not read footer: java.io.IOException:


I got the same problem trying to read a parquet file from S3. In my case the issue was the required libraries were not available for all workers in the cluster.

There are 2 ways to fix that:

  • Make sure you added the dependencies on the spark-submit command so it's distributed to the whole cluster
  • Add the dependencies on the /jars directory on your SPARK_HOME for each worker in the cluster.


if you open a parquet file (text editor), at the very bottom you will see something like "parquet-mr" and that could help you know what version/format the file was created from

the method above though simple, the "creator" can be something else like impala or other component that can create parquet files and you can use parquet-tools https://github.com/apache/parquet-mr/tree/master/parquet-tools

since it looks like you are using spark to read the parquet file you might be able to work-around it by setting spark.sql.parquet.filterPushdown to false. maybe try that first (more info here - https://spark.apache.org/docs/latest/sql-programming-guide.html#configuration - change latest to your version of spark).

if that does not work then maybe try if this is an issue with latest version of spark - if it does then you can try to trace history of which commits fixed it and that might give you an insight on possible work-around

or if you know the parquet version you can use (switch) the corresponding branch of parquet-mr (build the parquet-tools for that) and use the tools for that version to test your metadata files (_metadata, _common_metadata) or one of the parquet file - you should be able to reproduce the error and debug from there


Check your folder permissions. We have seen this error in other environments and it was caused by Spark not having the permission to access the file.