If I already have Hadoop installed, should I download Apache Spark WITH Hadoop or WITHOUT Hadoop? If I already have Hadoop installed, should I download Apache Spark WITH Hadoop or WITHOUT Hadoop? hadoop hadoop

If I already have Hadoop installed, should I download Apache Spark WITH Hadoop or WITHOUT Hadoop?


First off, Spark does not yet support Hadoop 3, as far as I know. You'll notice this by no available option for "your Hadoop version" available for download.

You can try setting HADOOP_CONF_DIR and HADOOP_HOME in your spark-env.sh, though, regardless of which you download.

You should always download the version without Hadoop if you already have it.

won't it start another additional instance of Hadoop?

No. You still would need to explicitly configure and start that version of Hadoop.

That Spark option is already configured to use the included Hadoop, I believe


This is in addition to the answer by @cricket_007.

If you have Hadoop installed, do not download spark with Hadoop, however, as your Hadoop version is still unsupported by any version of spark, you will need to download the one with Hadoop. Although, you will need to configure the bundled Hadoop version on your machine for Spark to run on. This will mean that all your data on the Hadoop 3 will be LOST. So, If you need this data, please take a backup of the data before beginning your downgrade/re-configuration. I do not think you will be able to host 2 instances of Hadoop on the same system because of certain environment variables.