Why do we need to format HDFS after every time we restart machine?
By changing dfs.datanode.data.dir
away from /tmp
you indeed made the data (the blocks) survive across a reboot. However there is more to HDFS than just blocks. You need to make sure all the relevant dirs point away from /tmp
, most notably dfs.namenode.name.dir
(I can't tell what other dirs you have to change, it depends on your config, but the namenode dir is mandatory, could be also sufficient).
I would also recommend using a more recent Hadoop distribution. BTW, the 1.1 namenode dir setting is dfs.name.dir
.
For those who use hadoop 2.0 or above versions config file names may be different.
As this answer points out, go to the /etc/hadoop
directory of your hadoop installation.
Open the file hdfs-site.xml. This user configuration will override the default hadoop configurations, that are loaded by the java classloader before.
Add dfs.namenode.name.dir
property and set a new namenode dir (default is file://${hadoop.tmp.dir}/dfs/name
).
Do the same for dfs.datanode.data.dir
property (default is file://${hadoop.tmp.dir}/dfs/data
).
For example:
<property> <name>dfs.namenode.name.dir</name> <value>/Users/samuel/Documents/hadoop_data/name</value></property><property> <name>dfs.datanode.data.dir</name> <value>/Users/samuel/Documents/hadoop_data/data</value></property>
Other property where a tmp dir appears is dfs.namenode.checkpoint.dir
. Its default value is: file://${hadoop.tmp.dir}/dfs/namesecondary
.
If you want, you can easily also add this property:
<property> <name>dfs.namenode.checkpoint.dir</name> <value>/Users/samuel/Documents/hadoop_data/namesecondary</value></property>