Hadoop client and cluster separation

First of all.. this link has detailed information on how client communcates with namenode

http://www.informit.com/articles/article.aspx?p=2460260&seqNum=2

To my understanding, your professor wants to have a separate node as client from which you can run hadoop jobs but that node should not be part of the hadoop cluster.

Consider a scenario where you have to submit Hadoop job from client machine and client machine is not part of existing Hadoop cluster. It is expected that job to be get executed on Hadoop cluster.

Namenode and Datanode forms Hadoop Cluster, Client submits job to Namenode.To achieve this, Client should have same copy of Hadoop Distribution and configuration which is present at Namenode.Then Only Client will come to know on which node Job tracker is running, and IP of Namenode to access HDFS data.

Go through configuration on Namenode,

core-site.xml will have this property-

<property>        <name>fs.default.name</name>        <value>192.168.0.1:9000</value></property>

mapred-site.xml will have this property-

<property>      <name>mapred.job.tracker</name>      <value>192.168.0.1:8021</value> </property>

These are two important properties must be copied to client machine’s Hadoop configuration.And you need to set one addtinal property in mapred-site.xml file, to overcome from Privileged Action Exception.

<property>      <name>mapreduce.jobtracker.staging.root.dir</name>      <value>/user</value></property>

Also you need to update /ets/hosts of client machine with IP addresses and hostnames of namenode and datanode.

Now you can submit job from client machine with hadoop jar command, and job will be executed on Hadoop Cluster. Note that, you shouldn’t start any hadoop service on client machine.

hadoop cluster-computing vpn hadoop2

Users shouldn't be able to disrupt the functionality of the cluster. That's the meaning. Imagine there is a whole bunch of data scientists that launch their jobs from one of the cluster's masters. In case someone launches a memory-intensive operation, the master processes that are running on the same machine could end up with no memory and crash. That would leave the whole cluster in a failed state.

If you separate client node from master/slave nodes, users could still crash the client, but the cluster would stay up.

CodeHunter

Hadoop client and cluster separation

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last