❮❮ PREV NEXT ❯❯

YARN

YARN was added to Hadoop 2.0. It stands for Yet Another Resource Negotiator. Initially it was just MapReduce but in Hadoop 2.0 the job of MapReduce got split to MapReduce and YARN. Where MapReduce was responsible for the processing of data and YARN was responsible for running those data which was processed by MapReduce.

YARN can be said to be a resource allocator and resource manager. i.e. In order to make the jobs run on various nodes in a cluster, YARN checks every node to see which all nodes are free and how much memory and CPU is available. Accordingly it allocates resources to them.

What is the core of YARN?

YARN is made of two core components :

Resource Manager : Resource Manager is the master and runs on the Name Node. It is also responsible for managing the CPU, memory of all the machines distributed in the Data Nodes.

Node Manager : Node Manager is the slave and runs on all the Data Nodes. And it is only responsible for managing the task that it is running on.

How does YARN manages the jobs?

When a job is submitted it goes to the Resource Manager. The Resource Manager then communicates with the Node Manager to see which all Nodes are free for executing the jobs. It then assigns the jobs to the Node Managers. Now, an individual Node Manager takes the job and runs it in his container.

Container : Since, so many jobs runs on a Node Manager, it divides its storage space, CPU, memory into logical chunks so all the jobs can run independently. This logical block is a container.

After a container in created for a Node Manager, the Resource Manager starts an Application Master on the individual container of the Node Manager. The Application Manager is responsible for the processing of data in a container.

❮❮ PREV NEXT ❯❯