What is a container in YARN?
It represents a resource (memory) on a single node at a given cluster.
A container is
- supervised by the node manager
- scheduled by the resource manager
One MR task runs in such container(s).
There can be multiple containers on a single Node (or a single very big one).
Every node in the system is considered to be composed of multiple containers of minimum size of memory (say 512MB or 1 GB). The ApplicationMaster can request any container as a multiple of the minimum memory size.
Source, see section ResourceManager/Resource Model.
In Hadoop 2.x, Container is a place where a unit of work occurs. For instance each MapReduce task(not the entire job) runs in one container.
An application/job will run on one or more containers.
Set of system resources are allocated for each container, currently CPU core and RAM are supported. Each node in a Hadoop cluster can run several containers.
In Hadoop 1.x a slot is allocated by the JobTracker to run each MapReduce task. Then the TaskTracker spawns a separate JVM for each task(unless JVM reuse is not enabled).