Repartition vs spark.sql.shuffle.partitions

Similar questions:

Coalesce vs. spark.sql.shuffle.partitions

Coalesce vs repartition vs spark.sql.shuffle.partitions

 

Not a frequently asked question but there are chances of getting this question in your interviews. So here is my blog to explain the differences.

 

There another post on this blog where I discussed the differences between coalesce and repartition

You can go through that and know the differences between them. But there is difference between (Coalesce & Repartition) and spark.sql.shuffle.partitions.

 

Coalesce operation will makes sure that the number of partitions and number of output files when we save the data will be equal to the value supplied to it.

Whereas spark.sql.shuffle.partitions will makes sure that the number of processes launched to complete the operation will be equal to the number that is set for this property.

 

Leave a Reply

Your email address will not be published. Required fields are marked *