What should be hadoop.tmp.dir ?
It's confusing, but hadoop.tmp.dir
is used as the base for temporary directories locally, and also in HDFS. The document isn't great, but mapred.system.dir
is set by default to "${hadoop.tmp.dir}/mapred/system"
, and this defines the Path on the HDFS where where the Map/Reduce framework stores system files.
If you want these to not be tied together, you can edit your mapred-site.xml
such that the definition of mapred.system.dir is something that's not tied to ${hadoop.tmp.dir}
Let me add a bit more to kkrugler's answer:
There're three HDFS properties which contain hadoop.tmp.dir
in their values
dfs.name.dir
: directory where namenode stores its metadata, with default value${hadoop.tmp.dir}/dfs/name
.dfs.data.dir
: directory where HDFS data blocks are stored, with default value${hadoop.tmp.dir}/dfs/data
.fs.checkpoint.dir
: directory where secondary namenode store its checkpoints, default value is${hadoop.tmp.dir}/dfs/namesecondary
.
This is why you saw the /mnt/hadoop-tmp/hadoop-${user.name}
in your HDFS after formatting namenode.
Had a look around for information on this one. Only thing I could come up with was this post on the Amazon Elastic MapReduce Dev Guide:
In hadoop-site.xml, we set hadoop.tmp.dir to /mnt/var/lib/hadoop/tmp. /mnt is where we mount the “extra” EC2 volumes, which can contain a lot more data than the default volume. (The exact amount depends on instance type.) Hadoop's RunJar.java (the module that unpacks the input JARs) interprets hadoop.tmp.dir as a Hadoop file system path rather than a local path, so it writes to the path in HDFS instead of a local path. HDFS is mounted under /mnt (specifically /mnt/var/lib/hadoop/dfs/. So, you can write lots of data to it.