Usage repartition can increase/decrease the number of data partitions. |
Spark coalesce can only reduce the number of data partitions. |
Repartition creates new data partitions and performs a full shuffle of evenly distributed data. |
Coalesce makes use of already existing partitions to reduce the amount of shuffled data unevenly. |
Repartition internally calls coalesce with shuffle parameter thereby making it slower than coalesce. |
Coalesce is faster than repartition. However, if there are unequal-sized data partitions, the speed might be slightly slower. |