What is Whole-Stage CodeGen in Spark?

Whole-Stage CodeGen is also known as Whole-Stage Java Code Generation, which is a physical query optimization phase in Spakr SQL that clubs multiple physical operations together to form a single Java function.

 

Whole-Stage Java code generation improves the execution performance by converting a query tree into an optimized function that eliminates unnecessary calls and leverages CPU registers for intermediate data.

Whole-Stage CodeGen is enabled by default in Spark 2.x. This can be controlled by the property spark.sql.codegen.wholeStage.

 

Whole-Stage CodeGen isĀ  getting used by some of the modern massively parallel processing databases to achieve efficiency in execution performance.

Leave a Reply

Your email address will not be published. Required fields are marked *