Hadoop Namenode Metadata - fsimage and edit logs Hadoop Namenode Metadata - fsimage and edit logs hadoop hadoop

Hadoop Namenode Metadata - fsimage and edit logs

Answer is by looking at information in the edit logs. If information is not available in the edit logs This question stands true for use-case when we write the new file to hdfs. While your namenode is running if you remove fsimage file and try to read the hdfs file it is able to read.

Removing the fsimage file from the running namenode will not cause issue with the read / write operations. When we restart the namenode, there will be errors stating that image file is not found.

Let me try to give some more explanation to help you out.

Only on start up hadoop looks fsimage file, in case if it is not there, namenode does not come up and log for formatting the namenode.

hadoop format -namenode command creates fsimage file (if edit logs are present). After namenode startup file metadata is fetched from edit logs (and if not found information in edit logs searched thru fsimage file). so fsimage just works as checkpoint where inforamtion is saved last time. This is also one of the reason secondary node keeps on sync (after 1 hour / 1 milliion transactions) from edit logs so that on start up from last checkpoint not much needs to be synced.

if you will turn the safemode ( command : hdfs dfsadmin -safemode enter) on and will use saveNamespace (command : hdfs dfsadmin -saveNamespace), it will show below mentioned log message.

2014-07-05 15:03:13,195 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file /data/hadoop-namenode-data-temp/current/fsimage.ckpt_0000000000000000169 using no compression2014-07-05 15:03:13,205 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file /data/hadoop-namenode-data-temp/current/fsimage.ckpt_0000000000000000169 of size 288 bytes saved in 0 seconds.2014-07-05 15:03:13,213 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to retain 2 images with txid >= 02014-07-05 15:03:13,237 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 170

The entire file system namespace, including the "mapping of blocks to files" and file system properties, is stored in a file called the FsImage.Remember "mapping of blocks to files" is a part of FsImage.This is stored both in memory and on disk.Along with FsImage, Hadoop will also store in memory, block to datanode mapping through block reports while the name node is (re)started and periodically.So when you move a file to a different location, this will be tracked in the edit log on disk and also when a block report is sent by data node to namenode, namenode will get an up-to-date view of where blocks are located on the cluster.So that way, you will not be able to see the data in old path since block report has updated "mapping of blocks to datanodes".But remember the update has happened only in the memory.Now after a certain amount of time, either in checkpointing or when a name node is restarted, editlogs on disk which already have the updates that you have done(in your case movement of file) will get merged with the old FsImage on disk and creates a new FsImage.Now this updated FsImage will be loaded into memory and the same process repeats.

I'm kind of late to this question, but I think it's worth a clearer response.

If I got you right You want to know, if metadata are stored in edit log why after deleting a file, when we try to list the old file path, it complains that it does not exists or whatever? and how namenode knows that file or directory has been deleted without reading edit log?

It is exactly mentioned in chapter 11 in Hadoop the definitive guide book:

When a filesystem client performs a write operation (such as creating or moving a file), the transaction is first recorded in the edit log. The namenode also has an in-memory representation of the filesystem metadata, which it updates after the edit log has been modified. The in-memory metadata is used to serve read requests.

Having said that the answer is simple, because after updating the edit log namenode updates the in memory-representation. so when read request received it knows that the file or directory has been deleted and will complain that this does not exist.