CAP with distributed System CAP with distributed System hadoop hadoop

CAP with distributed System


HDFS has a unique central decision point, the namenode. As such it can only fall in the CP side, since taking down the namenode takes down the entire HDFS system (no Availability). Hadoop does not try to hide this:

The NameNode is a Single Point of Failure for the HDFS Cluster. HDFS is not currently a High Availability system. When the NameNode goes down, the file system goes offline. There is an optional SecondaryNameNode that can be hosted on a separate machine. It only creates checkpoints of the namespace by merging the edits file into the fsimage file and does not provide any real redundancy.

Since the decission where to place data and where it can be read from is always handled by the namenode, which maintains a consistent view in memory, HDFS is always consistent (C). It is also partition tolerant in that it can handle loosing data nodes, subject to replication factor and data topology strategies.

Is there any system that can provide CAP together?

Yes, such systems are often mentioned in Marketing and other non-technical publications.

When does a user Choose availability over consistency and vice versa.

This is a business use case decision. When availability is more important they choose AP. When consistency is more important, they choose CP. In general when money changes hands the consistency takes precedence. Almost every other case favors availability.

Is there any database out there that allows user to switch its choice accordingly between CP and AP

Systems that allows you to modify both the write and the read quorums can be tuned to be either CP or AP, depending on the needs.