Niijima: sound and automated computation consolidation for efficient multilingual data-parallel pipelines
2019
Multilingual data-parallel pipelines, such as Microsoft's Scope and Apache Spark, are widely used in real-world analytical tasks. While the involvement of multiple languages (often including both managed and native languages) provides much convenience in data manipulation and transformation, it comes at a performance cost --- managed languages need a managed runtime, incurring much overhead. In addition, each switch from a managed to a native runtime (and vice versa) requires marshalling or unmarshalling of an ocean of data objects, taking a large fraction of the execution time. This paper presents Niijima, an optimizing compiler for Microsoft's Scope/Cosmos, which can consolidate C#-based user-defined operators (UDOs) across SQL statements, thereby reducing the number of dataflow vertices that require the managed runtime, and thus the amount of C# computations and the data marshalling cost. We demonstrate that Niijima has reduced job latency by an average of 24% and up to 3.3x, on a series of production jobs.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
45
References
1
Citations
NaN
KQI