Write to HDFS running in Docker from another Docker container running Spark Write to HDFS running in Docker from another Docker container running Spark docker docker

Write to HDFS running in Docker from another Docker container running Spark


The URI hdfs:///user/root/input/test is missing an authority (hostname) section and port. To write to hdfs in another container you would need to fully specify the URI and make sure the two containers were on the same network and that the HDFS container has the ports for the namenode and data node exposed.

For example, you might have set the host name for the HDFS container to be hdfs.container. Then you can write to that HDFS instance using the URI hdfs://hdfs.container:8020/user/root/input/test (assuming the Namenode is running on 8020). Of course you will also need to make sure that the path you're seeking to write has the correct permissions as well.

So to do what you want:

  • Make sure your HDFS container has the namenode and datanode ports exposed. You can do this using an EXPOSE directive in the dockerfile (the container you linked does not have these) or using the --expose argument when invoking docker run. The default ports are 8020 and 50010 (for NN and DN respectively).
  • Start the containers on the same network. If you just do docker run with no --network they will start on the default network and you'll be fine. Start the HDFS container with a specific name using the --name argument.
  • Now modify your URI to include the proper authority (this will be the value of the docker --name argument you passed) and port as described above and it should work