❮❮ PREV NEXT ❯❯

MAP REDUCE

The moto of Hadoop was to break the data into parts and store in different machines. So that the hard disk of a single system doesn't have to take the entire load. Same applies for the processing of data. Each system which has that piece of data is responsible for processing it.

Example :

Say there is a file of 10Gb and there are five nodes A, B, C, D and E in a cluster. Now, he file of 10GB has to be broken into chunks and distributed equally among the five Nodes. So, the Node A will be responsible for processing the data in Node A. Similarly Node B would be responsible for processing Node B data and so on.

In other words all the Nodes would be working together for processing and storing the distributed data.

Here MapReduce is used for the processing of data.

The processing of MapReduce happens in two phases. The Map and the Reduce.

Map

A Map runs on multiple nodes to process the chunk of data available in that particular node.

Example :

Say in Node A, a chunk/block of data resides. Now, you have written your business logic in the Map to process that chunk of data (In Node A). The Map processes that chunk and keeps the result ready. Similar case happens for all the nodes. All the Maps in each node completes their task and keeps the results ready.

Reduce

The Reduce job comes into picture after all the Maps are done with their job. The Reduce job runs only on a single Node. His job is to collect all the Map outputs, combine them and produce the final result.

How MapReduce works?

There are multiple Maps that runs on each Node. It then collects the data in that node and converts into key and value pair. Then the reducer comes into picture. It collects the output from all the Maps and joins those to produce the desired output.

❮❮ PREV NEXT ❯❯