name node Vs secondary name node

hadoop hdfs hadoop2 high-availability

The namenode stores the HDFS filesystem information in a file named fsimage. Updates to the file system (add/remove blocks) are not updating the fsimage file, but instead are logged into a file, so the I/O is fast append only streaming as opposed to random file writes. When restaring, the namenode reads the fsimage and then applies all the changes from the log file to bring the filesystem state up to date in memory. This process takes time.

The secondarynamenode job is not to be a secondary to the name node, but only to periodically read the filesystem changes log and apply them into the fsimage file, thus bringing it up to date. This allows the namenode to start up faster next time.

Unfortunatley the secondarynamenode service is not a standby secondary namenode, despite its name. Specifically, it does not offer HA for the namenode. This is well illustrated here.

See Understanding NameNode Startup Operations in HDFS.

Note that more recent distributions (current Hadoop 2.6) introduces namenode High Availability using NFS (shared storage) and/or namenode High Availability using Quorum Journal Manager.

hadoop hdfs hadoop2 high-availability

Things have been changed over the years especially with Hadoop 2.x. Now Namenode is highly available with fail over feature.

Secondary Namenode is optional now & Standby Namenode has been to used for failover process.

Standby NameNode will stay up-to-date with all the file system changes the Active NameNode makes .

HDFS High availability is possible with two options : NFS and Quorum Journal Manager but Quorum Journal Manager is preferred option.

Have a look at Apache documentation

From Slide 8 from : http://www.slideshare.net/cloudera/hdfs-futures-world2012-widescreen

When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs. The Standby node is reads these edits from the JNs and apply to its own name space.

In the event of a failover, the Standby will ensure that it has read all of the edits from the JounalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.

Have a look at about fail over process in related SE question :

How does Hadoop Namenode failover process works?

Regarding your queries on CAP theory for Hadoop:

It can be strong consistent
HDFS is almost highly Available unless you met with some bad luck ( If all three replicas of a block are down, you won't get data)
Supports data Partition

hadoop hdfs hadoop2 high-availability

Name Node is a primary node in which all the metadata into is stored into fsimage and editlog files periodically. But, when name node down secondary node will be online but this node only have the read access to the fsimage and editlog files and dont have the write access to them . All the secondary node operations will be stored to temp folder . when name node back to online this temp folder will be copied to name node and the namenode will update the fsimage and editlog files.

CodeHunter

name node Vs secondary name node

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last