Error launching job using mrjob on Hadoop Error launching job using mrjob on Hadoop hadoop hadoop

Error launching job using mrjob on Hadoop


This is a known problem with Hadoop 2.x & mrjob. Please make the following changes, format your namenode, restart your hadoop instance + yarn and things should work.

core-site.xml

<configuration>    <property>        <name>fs.defaultFS</name>        <value>hdfs://localhost:9000</value>    </property> <property>      <name>hadoop.tmp.dir</name>      <value>/tmp</value>      <description>A base for other temporary directories.</description>    </property></configuration>

hdfs-site.xml

<configuration>    <property>        <name>dfs.replication</name>        <value>1</value>    </property> <property>      <name>hadoop.tmp.dir</name>      <value>/tmp</value>      <description>A base for other temporary directories.</description>    </property></configuration>

mapred-site.xml

<configuration>  <property>     <name>mapreduce.framework.name</name>     <value>yarn</value>   </property></configuration>

yarn-site.xml

<configuration>    <!-- Site specific YARN configuration properties -->    <property>        <name>yarn.scheduler.minimum-allocation-mb</name>        <value>128</value>        <description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>    </property>    <property>        <name>yarn.scheduler.maximum-allocation-mb</name>        <value>2048</value>        <description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>    </property>    <property>        <name>yarn.scheduler.minimum-allocation-vcores</name>        <value>1</value>        <description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>    </property>    <property>        <name>yarn.scheduler.maximum-allocation-vcores</name>        <value>2</value>        <description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>    </property>    <property>        <name>yarn.nodemanager.resource.memory-mb</name>        <value>4096</value>        <description>Physical memory, in MB, to be made available to running containers</description>    </property>    <property>        <name>yarn.nodemanager.resource.cpu-vcores</name>        <value>4</value>        <description>Number of CPU cores that can be allocated for containers.</description>    </property>    <property>      <name>yarn.nodemanager.aux-services</name>      <value>mapreduce_shuffle</value>      <description>shuffle service that needs to be set for Map Reduce to run </description>    </property></configuration>

Then run:

hdfs namenode -formatstart-dfs.shstart-yarn.sh

Cheers,

Thusjanthan Kubendranathan