Learnerslesson
   JAVA   
  SPRING  
  SPRINGBOOT  
 HIBERNATE 
  HADOOP  
   HIVE   
   ALGORITHMS   
   PYTHON   
   GO   
   KOTLIN   
   C#   
   RUBY   
   C++   




MAP REDUCE EXAMPLE

Let us take the word count example, where we will be writing a MapReduce job to count the number of words in a file. i.e. How many times a particular word is repeated in the file.

Let's consider the below text file :


story.txt :

In a huge pond, there lived many fish.
They were arrogant and never listened to anyone.
In this pond, there also lived a kind-hearted crocodile.
He advised the fish, not to be arrogant and overconfident.
But the fish never listened to him.
One afternoon, the crocodile was resting
when two fishermen stopped there to drink water.

For our simpler understanding we have taken a text file of small size. So, if we consider this file is broken into chunks and is distributed across the Nodes A, B, C and D in the cluster.



Node A - Map1

In a huge pond, there lived many fish.
They were arrogant and never listened to anyone.

Node B - Map2

In this pond, there also lived a kind-hearted crocodile.
He advised the fish, not to be arrogant and overconfident.


Node C - Map3

But the fish never listened to him.
One afternoon, the crocodile was resting

Node D - Map4

when two fishermen stopped there to drink water.

In the above scenario the file is distributed across Node A, B, C, D and in each Node a Map job runs. i.e. Map1, Map2, Map3, Map4.

Since, the above example is for word count, each Map is responsible for counting the words of it's own Node. And we have seen the output of the Map is in key and value pair. Where the key is the actual word and the value is the count.

For Node A, the Map1 output in key and value pair is going to be :


{In ,1}
{a, 1}
{huge, 1}
...
...

Similarly for Node B, the Map2 output is going to be :


{In ,1}
{this, 1}
{pond, 1}
{there, 1}
...
...

Same logic applies for Node C and Node D.

So, all the Maps have produced the output in key value pairs and the output is sent to the reducer.


hadoop_Map_Reduce

Now, the reducer is going to take the keys from all the Mapper outputs and will combine their values.


{In ,2}
{a, 1}
{huge, 1}
{pond, 2}
{there, 3}
...
...

To get a clear picture, let us match the above output of the reducer with the mapper outputs.

For Node A, the Map1 output was :

{In ,1}

And for Node B, the Map2 output was also :

{In ,1}

Now, the reducer combines both the keys and adds the values. So, you get :

{In ,2}

Similarly all other keys are combines and you get the desired word count in key value pair.