Write to HDFS running in Docker from another Docker container running Spark
The URI hdfs:///user/root/input/test
is missing an authority (hostname) section and port. To write to hdfs in another container you would need to fully specify the URI and make sure the two containers were on the same network and that the HDFS container has the ports for the namenode and data node exposed.
For example, you might have set the host name for the HDFS container to be hdfs.container
. Then you can write to that HDFS instance using the URI hdfs://hdfs.container:8020/user/root/input/test
(assuming the Namenode is running on 8020). Of course you will also need to make sure that the path you're seeking to write has the correct permissions as well.
So to do what you want:
- Make sure your HDFS container has the namenode and datanode ports exposed. You can do this using an
EXPOSE
directive in the dockerfile (the container you linked does not have these) or using the--expose
argument when invokingdocker run
. The default ports are 8020 and 50010 (for NN and DN respectively). - Start the containers on the same network. If you just do
docker run
with no--network
they will start on the default network and you'll be fine. Start the HDFS container with a specific name using the--name
argument. - Now modify your URI to include the proper authority (this will be the value of the docker
--name
argument you passed) and port as described above and it should work