What is a container in YARN? What is a container in YARN? hadoop hadoop

What is a container in YARN?


It represents a resource (memory) on a single node at a given cluster.
A container is

  • supervised by the node manager
  • scheduled by the resource manager

One MR task runs in such container(s).


There can be multiple containers on a single Node (or a single very big one).

Every node in the system is considered to be composed of multiple containers of minimum size of memory (say 512MB or 1 GB). The ApplicationMaster can request any container as a multiple of the minimum memory size.

Source, see section ResourceManager/Resource Model.


In Hadoop 2.x, Container is a place where a unit of work occurs. For instance each MapReduce task(not the entire job) runs in one container.

An application/job will run on one or more containers.

Set of system resources are allocated for each container, currently CPU core and RAM are supported. Each node in a Hadoop cluster can run several containers.

In Hadoop 1.x a slot is allocated by the JobTracker to run each MapReduce task. Then the TaskTracker spawns a separate JVM for each task(unless JVM reuse is not enabled).