Coalesce vs. spark.sql.shuffle.partitions
Coalesce vs repartition vs spark.sql.shuffle.partitions
Not a frequently asked question but there are chances of getting this question in your interviews. So here is my blog to explain the differences.
There another post on this blog where I discussed the differences between coalesce and repartition
You can go through that and know the differences between them. But there is difference between (Coalesce & Repartition) and spark.sql.shuffle.partitions.
Coalesce operation will makes sure that the number of partitions and number of output files when we save the data will be equal to the value supplied to it.
Whereas spark.sql.shuffle.partitions will makes sure that the number of processes launched to complete the operation will be equal to the number that is set for this property.