hadoop No FileSystem for scheme: file hadoop No FileSystem for scheme: file hadoop hadoop

hadoop No FileSystem for scheme: file


This is a typical case of the maven-assembly plugin breaking things.

Why this happened to us

Different JARs (hadoop-commons for LocalFileSystem, hadoop-hdfs for DistributedFileSystem) each contain a different file called org.apache.hadoop.fs.FileSystem in their META-INFO/services directory. This file lists the canonical classnames of the filesystem implementations they want to declare (This is called a Service Provider Interface implemented via java.util.ServiceLoader, see org.apache.hadoop.FileSystem#loadFileSystems).

When we use maven-assembly-plugin, it merges all our JARs into one, and all META-INFO/services/org.apache.hadoop.fs.FileSystem overwrite each-other. Only one of these files remains (the last one that was added). In this case, the FileSystem list from hadoop-commons overwrites the list from hadoop-hdfs, so DistributedFileSystem was no longer declared.

How we fixed it

After loading the Hadoop configuration, but just before doing anything FileSystem-related, we call this:

    hadoopConfig.set("fs.hdfs.impl",         org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()    );    hadoopConfig.set("fs.file.impl",        org.apache.hadoop.fs.LocalFileSystem.class.getName()    );

Update: the correct fix

It has been brought to my attention by krookedking that there is a configuration-based way to make the maven-assembly use a merged version of all the FileSystem services declarations, check out his answer below.


For those using the shade plugin, following on david_p's advice, you can merge the services in the shaded jar by adding the ServicesResourceTransformer to the plugin config:

  <plugin>    <groupId>org.apache.maven.plugins</groupId>    <artifactId>maven-shade-plugin</artifactId>    <version>2.3</version>    <executions>      <execution>        <phase>package</phase>        <goals>          <goal>shade</goal>        </goals>        <configuration>          <transformers>            <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>          </transformers>        </configuration>      </execution>    </executions>  </plugin>

This will merge all the org.apache.hadoop.fs.FileSystem services in one file


For the record, this is still happening in hadoop 2.4.0. So frustrating...

I was able to follow the instructions in this link: http://grokbase.com/t/cloudera/scm-users/1288xszz7r/no-filesystem-for-scheme-hdfs

I added the following to my core-site.xml and it worked:

<property>   <name>fs.file.impl</name>   <value>org.apache.hadoop.fs.LocalFileSystem</value>   <description>The FileSystem for file: uris.</description></property><property>   <name>fs.hdfs.impl</name>   <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>   <description>The FileSystem for hdfs: uris.</description></property>