Hadoop client and cluster separation Hadoop client and cluster separation hadoop hadoop

Hadoop client and cluster separation


First of all.. this link has detailed information on how client communcates with namenode

http://www.informit.com/articles/article.aspx?p=2460260&seqNum=2

To my understanding, your professor wants to have a separate node as client from which you can run hadoop jobs but that node should not be part of the hadoop cluster.

Consider a scenario where you have to submit Hadoop job from client machine and client machine is not part of existing Hadoop cluster. It is expected that job to be get executed on Hadoop cluster.

Namenode and Datanode forms Hadoop Cluster, Client submits job to Namenode.To achieve this, Client should have same copy of Hadoop Distribution and configuration which is present at Namenode.Then Only Client will come to know on which node Job tracker is running, and IP of Namenode to access HDFS data.

Go through configuration on Namenode,

core-site.xml will have this property-

<property>        <name>fs.default.name</name>        <value>192.168.0.1:9000</value></property> 

mapred-site.xml will have this property-

<property>      <name>mapred.job.tracker</name>      <value>192.168.0.1:8021</value> </property>
These are two important properties must be copied to client machine’s Hadoop configuration.And you need to set one addtinal property in mapred-site.xml file, to overcome from Privileged Action Exception.

<property>      <name>mapreduce.jobtracker.staging.root.dir</name>      <value>/user</value></property>
Also you need to update /ets/hosts of client machine with IP addresses and hostnames of namenode and datanode.

Now you can submit job from client machine with hadoop jar command, and job will be executed on Hadoop Cluster. Note that, you shouldn’t start any hadoop service on client machine.


Users shouldn't be able to disrupt the functionality of the cluster. That's the meaning. Imagine there is a whole bunch of data scientists that launch their jobs from one of the cluster's masters. In case someone launches a memory-intensive operation, the master processes that are running on the same machine could end up with no memory and crash. That would leave the whole cluster in a failed state.

If you separate client node from master/slave nodes, users could still crash the client, but the cluster would stay up.