name node Vs secondary name node name node Vs secondary name node hadoop hadoop

name node Vs secondary name node


The namenode stores the HDFS filesystem information in a file named fsimage. Updates to the file system (add/remove blocks) are not updating the fsimage file, but instead are logged into a file, so the I/O is fast append only streaming as opposed to random file writes. When restaring, the namenode reads the fsimage and then applies all the changes from the log file to bring the filesystem state up to date in memory. This process takes time.

The secondarynamenode job is not to be a secondary to the name node, but only to periodically read the filesystem changes log and apply them into the fsimage file, thus bringing it up to date. This allows the namenode to start up faster next time.

Unfortunatley the secondarynamenode service is not a standby secondary namenode, despite its name. Specifically, it does not offer HA for the namenode. This is well illustrated here.

See Understanding NameNode Startup Operations in HDFS.

Note that more recent distributions (current Hadoop 2.6) introduces namenode High Availability using NFS (shared storage) and/or namenode High Availability using Quorum Journal Manager.


Things have been changed over the years especially with Hadoop 2.x. Now Namenode is highly available with fail over feature.

Secondary Namenode is optional now & Standby Namenode has been to used for failover process.

Standby NameNode will stay up-to-date with all the file system changes the Active NameNode makes .

HDFS High availability is possible with two options : NFS and Quorum Journal Manager but Quorum Journal Manager is preferred option.

Have a look at Apache documentation

From Slide 8 from : http://www.slideshare.net/cloudera/hdfs-futures-world2012-widescreen

When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs. The Standby node is reads these edits from the JNs and apply to its own name space.

In the event of a failover, the Standby will ensure that it has read all of the edits from the JounalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.

enter image description here

Have a look at about fail over process in related SE question :

How does Hadoop Namenode failover process works?

Regarding your queries on CAP theory for Hadoop:

  1. It can be strong consistent
  2. HDFS is almost highly Available unless you met with some bad luck ( If all three replicas of a block are down, you won't get data)
  3. Supports data Partition


Name Node is a primary node in which all the metadata into is stored into fsimage and editlog files periodically. But, when name node down secondary node will be online but this node only have the read access to the fsimage and editlog files and dont have the write access to them . All the secondary node operations will be stored to temp folder . when name node back to online this temp folder will be copied to name node and the namenode will update the fsimage and editlog files.