NoSuchMethodError Sets.newConcurrentHashSet() while running jar using hadoop NoSuchMethodError Sets.newConcurrentHashSet() while running jar using hadoop hadoop hadoop

NoSuchMethodError Sets.newConcurrentHashSet() while running jar using hadoop


You are basically running into a version conflict. The problem goes like this,

  • Both hadoop native libraries and cassandra uses google guava.
  • But your hadoop version is using an older version of guava (11.xx) while your cassandra is update and uses guava 16.0. It is not very common for enterprise scale hadoop setups to update their environment with every new release.
  • cassandra config loader uses newConcurrentHashSet() method which is not present in your older version.
  • jars used by hadoop are always loaded before any third party jars. Hence even though a correct version of guava is present in your "with dependencies" jar, an older version of guava jar was being loaded from hadoop classpath and distributed to your mappers/reducers.

Solution:

  • Set the configuration parameter “mapreduce.job.user.classpath.first” to true in the run method of your Job :

    job.getConfiguration().set("mapreduce.job.user.classpath.first", "true");
  • Now, in your bin/hadoop , add the statement

    export HADOOP_USER_CLASSPATH_FIRST=truewhich will tell hadoop to load user defined libraries first. 
  • Make sure the latest version of your library is present in your hadoop classpath before the older one.