Overriding default hadoop jars in class path Overriding default hadoop jars in class path hadoop hadoop

Overriding default hadoop jars in class path


So, assuming you're using 0.20.203, this is handled in the TaskRunner.java code as follows:

  • The property you're looking for is on line 94 - mapreduce.user.classpath.first
  • Line 214 is where the call is made to build the list of classpaths, which delegates to a method called getClassPaths(..)
  • getClassPaths() is defined on line 524, and you should be able to see that the configuration property is used to decide on whether your job + dist cache libraries, or the hadoop libraries go on the classpath first

For other versions of hadoop, you're best to check the TaskRunner.java class to confirm the name of the config property after all this is a "semi hidden config":

static final String MAPREDUCE_USER_CLASSPATH_FIRST =        "mapreduce.user.classpath.first"; //a semi-hidden config


As in the latest Hadoop version (2.2+), you should set:

    conf.setBoolean(MRJobConfig.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, true);


These settings work for referencing classes of external jars only in your mapper or reducer tasks. If, however, you are using these in, for example a customized InputFormat, it will fail to load the class. A way to make sure this also works everywhere (in MR2) is exporting this setting when submitting your job:

export HADOOP_USER_CLASSPATH_FIRST=true