Example of MapReduce
You can better understand, how MapReduce works by taking an example where we would have a text file called example.txt whose contents are:
Deer, Bear, River, Car, Car, River, Deer, Car, Bear
Now, we can perform a word count on the sample.txt using MapReduce. So, we will be finding unique words and the number of occurrences of those unique words.
- Divide the input into three splits as shown in the diagram. This will distribute the work among all the map nodes
- Tokenize the words in each of the mappers and give a hardcoded value (1) to each of the tokens or words
- A list of key-value pairs is created where the key is nothing but the individual words and the value is one. So, for (Deer Bear River) we have — Deer, 1; Bear, 1; River, 1
- Sorting and shuffling happen so that all the tuples with the same key are sent to the corresponding reducer
- After the sorting and shuffling phase, each reducer will have a unique key and a list of values corresponding to that very key. For example, Bear, [1,1]; Car, [1,1,1]...
- Each Reducer counts the values which are present in that list of values, and gives the final output as — Bear, 2
- All the output key/value pairs are collected and written in the output file
