YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register

linux hadoop mapreduce distributed-computing hadoop-yarn

I finally got this solved. Posting detailed steps for future reference. (only for test environment)

Hadoop (2.7.1) Multi-Node cluster configuration

Make sure that you have a reliable network without host isolation. Static IP assignment is preferable or at-least have extremely long DHCP lease. Additionally all nodes (Namenode/master & Datanodes/slaves) should have a common user account with same password; in case you don't, make such user account on all nodes. Having same username and password on all nodes makes things a bit less complicated.
[on all machines] First configure all nodes for single-node cluster. You can use my script that I have posted over here.

execute these commands in a new terminal

[on all machines] ↴

stop-dfs.sh;stop-yarn.sh;jpsrm -rf /tmp/hadoop-$USER

[on Namenode/master only] ↴

rm -rf ~/hadoop_store/hdfs/datanode

[on Datanodes/slaves only] ↴

rm -rf ~/hadoop_store/hdfs/namenode

[on all machines] Add IP addresses and corresponding Host names for all nodes in the cluster.

sudo nano /etc/hosts

hosts

xxx.xxx.xxx.xxx masterxxx.xxx.xxx.xxy slave1xxx.xxx.xxx.xxz slave2# Additionally you may need to remove lines like "xxx.xxx.xxx.xxx localhost", "xxx.xxx.xxx.xxy localhost", "xxx.xxx.xxx.xxz localhost" etc if they exist.# However it's okay keep lines like "127.0.0.1 localhost" and others.

[on all machines] Configure iptables
Allow default or custom ports that you plan to use for various Hadoop daemons through the firewall
OR
much easier, disable iptables
- on RedHat like distros (Fedora, CentOS)
```
sudo systemctl disable firewalldsudo systemctl stop firewalld
```
- on Debian like distros (Ubuntu)
```
sudo ufw disable
```
[on Namenode/master only] Gain ssh access from Namenode (master) to all Datnodes (slaves).
```
ssh-copy-id -i ~/.ssh/id_rsa.pub $USER@slave1ssh-copy-id -i ~/.ssh/id_rsa.pub $USER@slave2
```
confirm things by running ping slave1, ssh slave1, ping slave2, ssh slave2 etc. You should have a proper response. (Remember to exit each of your ssh sessions by typing exit or closing the terminal. To be on the safer side I also made sure that all nodes were able to access each other and not just the Namenode/master.)

[on all machines] edit core-site.xml file

nano /usr/local/hadoop/etc/hadoop/core-site.xml

core-site.xml

<configuration>    <property>        <name>fs.defaultFS</name>        <value>master:9000</value>        <description>NameNode URI</description>    </property></configuration>

[on all machines] edit yarn-site.xml file

nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

yarn-site.xml

<configuration>    <property>        <name>yarn.resourcemanager.hostname</name>        <value>master</value>        <description>The hostname of the RM.</description>    </property>    <property>         <name>yarn.nodemanager.aux-services</name>         <value>mapreduce_shuffle</value>    </property>    <property>         <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>         <value>org.apache.hadoop.mapred.ShuffleHandler</value>    </property></configuration>

[on all machines] modify slaves file, remove the text "localhost" and add slave hostnames
```
nano /usr/local/hadoop/etc/hadoop/slaves
```
slaves
```
slave1slave2
```
(I guess having this only on Namenode/master will also work but I did this on all machines anyway. Also note that in this configuration master behaves only as resource manger, this is how I intent it to be.)
[on all machines] modify hdfs-site.xml file to change the value for property dfs.replication to something > 1 (at-least to the number of slaves in the cluster; here I have two slaves so I would set it to 2)
[on Namenode/master only] (re)format the HDFS through namenode
```
hdfs namenode -format
```
[optional]
- remove dfs.datanode.data.dir property from master's hdfs-site.xml file.
- remove dfs.namenode.name.dir property from all slave's hdfs-site.xml file.

TESTING (execute only on Namenode/master)

start-dfs.sh;start-yarn.shecho "hello world hello Hello" > ~/Downloads/test.txthadoop fs -mkdir /inputhadoop fs -put ~/Downloads/test.txt /inputhadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount /input /output

wait for a few seconds and the mapper and reducer should begin.

These links helped me with the issue:

linux hadoop mapreduce distributed-computing hadoop-yarn

I met the same problem when I ran

"hadoop jar hadoop-mapreduce-examples-2.6.4.jar wordcount /calculateCount/ /output"

this command stopped there,

I tracked the job, and find "there are 15 missing blocks, and they are all corrupted"

then I did the following:1) ran "hdfs fsck / "2) ran "hdfs fsck / -delete "3) added "-A INPUT -p tcp -j ACCEPT" to /etc/sysconfig/iptables on the two datanodes4) ran "stop-all.sh and start-all.sh"

everything goes well

I think the firewall is the key point.

CodeHunter

YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last