Total JVM runs on Hadoop Cluster? JVM life cycle in Hadoop Total JVM runs on Hadoop Cluster? JVM life cycle in Hadoop hadoop hadoop

Total JVM runs on Hadoop Cluster? JVM life cycle in Hadoop


Think of JVM to be an abstract computing machine on which a java based service can run.To answer your questions:-

1) For the sake of simplicity lets assume there is just one storage and one processing node

Hadoop 1.0

There were total 4 services NameNode,SecondaryNameNode, DataNode, JobTracker, and TaskTracker.Each service runs on a JVM. 4 JVMs for NameNode,SecondaryNameNode, DataNode, JobTracker each.

A TaskTracker is a service in the cluster that accepts tasks - Map, Reduce and Shuffle operations - from a JobTracker.The TaskTracker spawns a separate JVM processes to do the actual work.

Assuming there is only one slot available with tasktracker to do the actual work i.e running mapper or reducer.

Therefore, total number of JVMs = NameNode(1) + SecondaryNameNode(1) + DataNode(1) + JobTracker(1) + TaskTracker(2) = 6

Hadoop 2.0

Total services- Namenode,SecondaryNameNode, Resource Manager(ResourceManager,ApplicationManager and Scheduler), Node Manager(ApplicationMaster and Container), Datanode

1 JVM for each service, hence:-

Namenode(1) +SecondaryNameNode(1) + ResourceManager(1) +ApplicationManager(1) + Scheduler(1) + Node Manager(1) +ApplicationMaster(1) +Container(1) + Datanode(1) = 9

> Processing is performed on container(JVM) whereas Node manager(JVM) looks after the operations.Each yarn application requires its own ApplicationMaster(JVM)

2) Point one describes the min number of JVMs. We can't certaintly tell the max number, as you can go on adding the storage and working nodes in your cluster and hence the number of JVMs will go up.

3) If you have more resources in your cluster, you can run multiple JVMs and that way you can more storage(datanode) and processing(nodemanager and container) services running. Yes you can control the JVM reuse by configuring the property mapred.job.reuse.jvm.num.tasks

4) Since all the services of hadoop framework run on JVMs only, JVMs are essential. You can't create JVM, operating system does that for you.You just need to start the JVM process.