Google News
logo
Spark - Interview Questions
How can you minimize data transfers when working with Spark?
Minimizing data transfers and avoiding shuffling helps write spark programs that run in a fast and reliable manner. The various ways in which data transfers can be minimized when working with Apache Spark are :
 
Using Broadcast Variable : Broadcast variable enhances the efficiency of joins between small and large RDDs.

Using Accumulators : Accumulators help update the values of variables in parallel while executing.

The most common way is to avoid operations ByKey, repartition or any other operations which trigger shuffles.
Advertisement