Towards dynamic SQL compilation in Apache Spark

Filippo Schiavio,Daniele Bonetta,Walter Binder

Towards dynamic SQL compilation in Apache Spark

2020

Filippo Schiavio
Daniele Bonetta
Walter Binder

Big-data systems have gained significant momentum, and Apache Spark is becoming a de-facto standard for modern data analytics. Spark relies on code generation to optimize the execution performance of SQL queries on a variety of data sources. Despite its already efficient runtime, Spark's code generation suffers from significant runtime overheads related to data de-serialization during query execution. Such performance penalty can be significant, especially when applications operate on human-readable data formats such as CSV or JSON.

Keywords:

SQL
Spark (mathematics)
Programming language
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations