Change hadoop version using spark-ec2 Change hadoop version using spark-ec2 hadoop hadoop

Change hadoop version using spark-ec2


Hadoop 2.0

spark-ec2 script doesn't support modifying existing cluster but you can create a new Spark cluster with Hadoop 2.

See this excerpt from the script's --help:

  --hadoop-major-version=HADOOP_MAJOR_VERSION                    Major version of Hadoop (default: 1)

So for example:

spark-ec2 -k spark -i ~/.ssh/spark.pem -s 1 --hadoop-major-version=2 launch my-spark-cluster

..will create you a cluster using current version of Spark and Hadoop 2.


If you use Spark v. 1.3.1 or Spark v. 1.4.0 and will create a standalone cluster then you will get Hadoop v. 2.0.0 MR1 (from Cloudera Hadoop Platform 4.2.0 distribution) this way.


The caveats are:

..but I have successfully used a few clusters of Spark 1.2.0 and 1.3.1 created with Hadoop 2.0.0, using some Hadoop2-specific features. (for Spark 1.2.0 with a few tweaks, that I have put in my forks of Spark and spark-ec2, but that's another story.)


Hadoop 2.4, 2.6

If you need Hadoop 2.4 or Hadoop 2.6 then I would currently (as of June 2015) recommend you to create a standalone cluster manually - it's easier than you probably think.