HadoopRDD error while trying to count lines in a file hosted on local HDFS using spark shell HadoopRDD error while trying to count lines in a file hosted on local HDFS using spark shell hadoop hadoop

HadoopRDD error while trying to count lines in a file hosted on local HDFS using spark shell


I am new to Apache Spark, Scala and Hadoop

Then you should be using the latest, stable versions of each. For starters, download the latest Spark that includes Hadoop.

hadoop-mapred is a deprecated package and you should not be using two different versions of Hadoop libraries. That explains why you would be getting ClassNotFoundException

If you downloaded Spark from the second link, it includes a version of Hadoop greater than 2.4, and those libraries are included on the Spark classpath, so you should not add them into your POM anyway. Find the Java quickstart POM

I'll also point out that you should actually get HDFS working before you try to run Spark against it (assuming you need to use Hadoop instead of standalone Spark).

But you do not need Hadoop at all to run spark.textFile("README.md" ).count from Spark shell