Minimum system requirements for running a Hadoop Cluster with High Availability Minimum system requirements for running a Hadoop Cluster with High Availability hadoop hadoop

Minimum system requirements for running a Hadoop Cluster with High Availability


For Hadoop HA - you need atleast two separate machine which can run Namenode and Namenode HA. So in theory you can have Hadoop HA cluster with atleast 2 machines. But that's not much useful in practical.

To answer your other question : 1. You can run DataNode service on the machine which runs Namenode service. This is general scenario in PoC cluster where you have small cluster (3-7nodes roughly)NOTE: You should use dedicated machines for Master services like Namenode in production as part of best practices.

  1. Yes you can run YARN services on the machine which runs Datanode or Namenode or both. In-fact , on single node cluster all services runs on one machines. Basically, all these services like Namenode , Datanode, YARN are Java process so they run on separate JVMs. You can host all these process on same node or different node as per wish.

Namenode mostly needs RAM which depends on your cluster data size and number blocks you have in your cluster or expected to have.Generally , your queries (CPU or I/O intensive) do not affect namenode system requirement.

For more service details refer :

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.htmlhttp://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html