Why does launching spark-shell with yarn-client fail with "java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream"? Why does launching spark-shell with yarn-client fail with "java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream"? hadoop hadoop

Why does launching spark-shell with yarn-client fail with "java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream"?


Answering my own question:First of all this is my personal mistake.Calling spark-shell I was launching it from the old (wrong) place /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/bin/spark-shell. I was sure that I've deleted all from CDH testings by yum remove cloudera*.

[root@master bin]# type spark-shellspark-shell is hashed (/usr/bin/spark-shell)[root@master bin]# hash -d spark-shell

Now, launching if from old spark-1.5.0-bin-without-hadoop.tgz still gave me the same error. Downloaded spark-1.5.0-bin-hadoop2.6, added export SPARK_DIST_CLASSPATH=$HADOOP_HOME - spark-shell is working now.


I was getting this error because by typing spark-shell, /usr/bin/spark-shell was getting executed.

To call my specific spark-shell, I ran the following command from inside of own-built spark source -

./bin/spark-shell


Instead of spark-1.5.0-bin-without-hadoop.tgz download one of the builds for Hadoop 2.x. They are simpler to set up as they come with the Hadoop client libraries.