An Evaluation of Serverless Data Processing Frameworks
2020
Serverless computing is a promising cloud execution model that significantly simplifies cloud users' operational concerns by offering features such as auto-scaling and a pay-as-you-go cost model. Consequently, serverless systems promise to provide an excellent fit for ad-hoc data processing. Unsurprisingly, numerous serverless systems/frameworks for data processing emerged recently from research and industry. However, systems researchers, decision-makers, and data analysts are unaware of how these serverless systems compare to each other.In this paper, we identify existing serverless frameworks for data processing. We present a qualitative assessment of different system architectures and an experiment-driven quantitative comparison, including performance, cost, and usability using the TPC-H benchmark. Our results show that the three publicly available serverless data processing frameworks outperform a comparatively sized Apache Spark cluster in terms of performance and cost for ad-hoc queries on cold data.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
15
References
2
Citations
NaN
KQI