How to set the VCORES in hadoop mapreduce/yarn?

Short Answer

It most probably doesn't matter, if you are just running hadoop out of the box on your single-node-cluster or even a small personal distributed cluster. You just need to worry about memory.

Long Answer

vCores are used for larger clusters in order to limit CPU for different users or applications. If you are using YARN for yourself there is no real reason to limit your container CPU. That is why vCores are not even taken into consideration by default in Hadoop !

Try setting your available nodemanager vcores to 1. It doesn't matter ! Your number of containers will still be 2 or 4 .. or whatever the value of :

yarn.nodemanager.resource.memory-mb / mapreduce.[map|reduce].memory.mb

If really do want the number of containers to take vCores into consideration and be limited by :

yarn.nodemanager.resource.cpu-vcores / mapreduce.[map|reduce].cpu.vcores

then you need to use a different a different Resource Calculator. Go to your capacity-scheduler.xml config and change DefaultResourceCalculator to DominantResourceCalculator.

In addition to using vCores for container allocation, you want to use vCores to really limit CPU usage of each node ? You need to change even more configurations to use the LinuxContainerExecutor instead of the DefaultContainerExecutor, because it can manage linux cgroups which are used to limit CPU resources. Follow this page if you want more info on this.

hadoop mapreduce hadoop-yarn hcatalog

yarn.nodemanager.resource.cpu-vcores - Number of CPU cores that can be allocated for containers.

mapreduce.map.cpu.vcores - The number of virtual CPU cores allocated for each map task of a job

mapreduce.reduce.cpu.vcores - The number of virtual CPU cores for each reduce task of a job

hadoop mapreduce hadoop-yarn hcatalog

I accidentally came across this question and I eventually managed to find the answers that I needed, so I will try to provide a complete answer.

Entities and they relations For each hadoop application/job, you have an Application Master that communicates with the ResourceManager about available resources on the cluster. The ResourceManager receives information about available resources on each node from each NodeManager. The resources are called Containers (memory and CPU). For more information see this.

Resource declaration on the cluster Each NodeManager provides information about its available resources. Relevant settings are yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores in $HADOOP_CONF_DIR/yarn-site.xml. They declare the memory and cpus that can be allocated to Containers.

Ask for resources For your jobs you can configure what resources are needed by each map/reduce. This can be done as follows (this is for the map tasks).

conf.set("mapreduce.map.cpu.vcores", "4");conf.set("mapreduce.map.memory.mb", "2048");

This will ask for 4 virtual cores and 2048MB of memory for each map task.

You can also configure the resources that are necessary for the Application Master the same way with the properties yarn.app.mapreduce.am.resource.mb and yarn.app.mapreduce.am.resource.cpu-vcores.

Those properties can have default values in $HADOOP_CONF_DIR/mapred-default.xml.

For more options and default values I would recommend you to take a look at this and this

CodeHunter

How to set the VCORES in hadoop mapreduce/yarn?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last