Google News
logo
Hadoop - Interview Questions
What is distributed cache? What are its benefits?
Distributed cache in Hadoop is a service by MapReduce framework to cache files when needed.
 
Once a file is cached for a specific job, Hadoop will make it available on each DataNode both in the system and in memory, where map and reduce tasks are executed. Later, you can easily access and read the cache file and populate any collection (like an array, hashmap) in your code.
 
Benefits of using distributed cache are as follows :
 
* It distributes simple, read-only text/data files and/or complex types such as jars, archives, and others. These archives are then un-archived at the slave node.

* Distributed cache tracks the modification timestamps of cache files, which notify that the files should not be modified until a job is executed.
Advertisement