Spark Job running on Yarn Cluster java.io.FileNotFoundException: File does not exits , eventhough the file exits on the master node Spark Job running on Yarn Cluster java.io.FileNotFoundException: File does not exits , eventhough the file exits on the master node hadoop hadoop

Spark Job running on Yarn Cluster java.io.FileNotFoundException: File does not exits , eventhough the file exits on the master node


The spark configuration was not pointing to the right hadoop Configuration directory. The hadoop configuration for 2.7.2 is residing at file path hadoop 2.7.2./etc/hadoop/ rather than /root/hadoop2.7.2/conf. When i pointed HADOOP_CONF_DIR=/root/hadoop2.7.2/etc/hadoop/ under spark-env.sh the spark submit started working and File not found exception disappeared. Earlier it was pointing to /root/hadoop2.7.2/conf (which does not exits). If spark does not points to proper hadoop configuration directory it might results in similar error. I think its probably a bug in spark , it should handle it gracefully rather than throwing ambiguous error messages .


I have got a similar error with Spark running on EMR. I have written my spark code in Java 8, and in EMR cluster spark runs ,by default, on Java 8. Then I had to recreate the cluster with JAVA_HOME pointing to the java 8 version. It has resolved my problem. Please check on the similar lines.


I had similar issue but the problem was related to having two core-site.xml one in $HADOOP_CONF_DIR and other in $SPARK_HOME/conf. The problem disappeared when I removed the one under $SPARK_HOME/conf