Whole-Stage CodeGen is also known as Whole-Stage Java Code Generation, which is a physical query optimization phase in Spakr SQL that clubs multiple physical operations together to form a single Java function.
Whole-Stage Java code generation improves the execution performance by converting a query tree into an optimized function that eliminates unnecessary calls and leverages CPU registers for intermediate data.
Whole-Stage CodeGen is enabled by default in Spark 2.x. This can be controlled by the property spark.sql.codegen.wholeStage.
Whole-Stage CodeGen isĀ getting used by some of the modern massively parallel processing databases to achieve efficiency in execution performance.