Google News
logo
Spark - Interview Questions
What is the difference between repartition and coalesce?
Repartition  Coalesce
Usage repartition can increase/decrease the number of data partitions. Spark coalesce can only reduce the number of data partitions.
Repartition creates new data partitions and performs a full shuffle of evenly distributed data. Coalesce makes use of already existing partitions to reduce the amount of shuffled data unevenly.
Repartition internally calls coalesce with shuffle parameter thereby making it slower than coalesce. Coalesce is faster than repartition. However, if there are unequal-sized data partitions, the speed might be slightly slower.
Advertisement