How to persist HDFS data in docker container How to persist HDFS data in docker container docker docker

How to persist HDFS data in docker container


You should inspect the dfs.datanode.data.dir in the hdfs-site.xml file to know where data is stored to the container filesystem

<property>    <name>dfs.datanode.data.dir</name>    <value>file:///root/hdfs/datanode</value>    <description>DataNode directory</description></property>

Without this file/property, the default location would be in file:///tmp/hadoop-${user.name}/dfs/data

For docker,. mind that the default user that runs the processes is the root user.

You will also need to persist the namenode files, again seen from that XML file

Which "path" inside the container corresponds to the HDFS path "/user/root/input/NewFile.txt"

The container path holds the blocks of the HDFS file, not the whole file itself