Java Spark disable Hadoop discovery Java Spark disable Hadoop discovery hadoop hadoop

Java Spark disable Hadoop discovery


So the final "trick" I've used is a mix of sandev and Vipul answers.

Create a 'fake' winutils in your project root :

mkdir <java_project_root>/bintouch <java_project_root>/bin/winutils.exe

Then, in your Spark configuration, provide the 'fake' HADOOP_HOME :

 public SparkConf sparkConfiguration() {    SparkConf cfg = new SparkConf();    File hadoopStubHomeDir = new File(".");    System.setProperty("hadoop.home.dir", hadoopStubHomeDir.getAbsolutePath());    cfg.setAppName("ScalaPython")            .setMaster("local")            .set("spark.executor.instances", "2");    return cfg;}

But still, it's a 'trick' to avoid Hadoop discovery, but it doesn't turn it off.


Just spark need winutils just create a folder example C:\hadoop\bin\winutils.exeand define inveroiment variable HADOOP_HOME = C:\hadoop and append to path variable C:\hadoop\bin.then u can use spark functionality


It's not because spark wants hadoop to be installed or it just wants that particular file.

First, You have to run the code with spark-submit, are you doing that? Please stick to that as a first approach since that would yield list library-related issues. After you've done that you can add this to your pom file to be able to run it directly from the IDE, I use IntelliJ but should work on eclipse as well

<dependency>        <groupId>org.apache.hadoop</groupId>        <artifactId>hadoop-common</artifactId>        <version>2.6.5</version></dependency>

Second, if it still doesn't work:

  1. Download the winutils file from http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe.

  2. create a new directory named bin inside some_other_directory

  3. in your code add this line before creating the Context.

    System.setProperty("hadoop.home.dir", "full path to some_other_directory");

Pro tip, switch to using Scala. Not that it's necessary but that's where spark feels most at home and it wouldn't take you more than a day or two to get the basic programs running just right.