Google News
logo
Hadoop - Interview Questions
State the reason why we can't perform "aggregation" (addition) in mapper? Why do we need the "reducer" for this?
This answer includes many points, so we will go through them sequentially.
 
We cannot perform “aggregation” (addition) in mapper because sorting does not occur in the “mapper” function. Sorting occurs only on the reducer side and without sorting aggregation cannot be done.

During “aggregation”, we need the output of all the mapper functions which may not be possible to collect in the map phase as mappers may be running on the different machine where the data blocks are stored.

And lastly, if we try to aggregate data at mapper, it requires communication between all mapper functions which may be running on different machines. So, it will consume high network bandwidth and can cause network bottlenecking.
Advertisement