Why does launching spark-shell with yarn-client fail with "java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream"?
Answering my own question:First of all this is my personal mistake.Calling spark-shell
I was launching it from the old (wrong) place /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/bin/spark-shell
. I was sure that I've deleted all from CDH testings by yum remove cloudera*
.
[root@master bin]# type spark-shellspark-shell is hashed (/usr/bin/spark-shell)[root@master bin]# hash -d spark-shell
Now, launching if from old spark-1.5.0-bin-without-hadoop.tgz
still gave me the same error. Downloaded spark-1.5.0-bin-hadoop2.6
, added export SPARK_DIST_CLASSPATH=$HADOOP_HOME
- spark-shell is working now.
I was getting this error because by typing spark-shell
, /usr/bin/spark-shell
was getting executed.
To call my specific spark-shell, I ran the following command from inside of own-built spark source -
./bin/spark-shell
Instead of spark-1.5.0-bin-without-hadoop.tgz
download one of the builds for Hadoop 2.x. They are simpler to set up as they come with the Hadoop client libraries.