How does Hadoop's HDFS High Availability feature affects the CAP Theorem? How does Hadoop's HDFS High Availability feature affects the CAP Theorem? hadoop hadoop

How does Hadoop's HDFS High Availability feature affects the CAP Theorem?


HDFS does not provide Availability in case of multiple correlated failures (for instance, three failed data nodes with the same HDFS block).

From CAP Confusion: Problems with partition tolerance

Systems such as ZooKeeper are explicitly sequentially consistent because there are few enough nodes in a cluster that the cost of writing to quorum is relatively small. The Hadoop Distributed File System (HDFS) also chooses consistency – three failed datanodes can render a file’s blocks unavailable if you are unlucky. Both systems are designed to work in real networks, however, where partitions and failures will occur, and when they do both systems will become unavailable, having made their choice between consistency and availability. That choice remains the unavoidable reality for distributed data stores.


HDFS High Availability makes HDFS more available, but not completely. If a network partition makes the client unable to communicate with either NameNode then the cluster is effectively unavailable.