An Evaluation of Serverless Data Processing Frameworks

2020 
Serverless computing is a promising cloud execution model that significantly simplifies cloud users' operational concerns by offering features such as auto-scaling and a pay-as-you-go cost model. Consequently, serverless systems promise to provide an excellent fit for ad-hoc data processing. Unsurprisingly, numerous serverless systems/frameworks for data processing emerged recently from research and industry. However, systems researchers, decision-makers, and data analysts are unaware of how these serverless systems compare to each other.In this paper, we identify existing serverless frameworks for data processing. We present a qualitative assessment of different system architectures and an experiment-driven quantitative comparison, including performance, cost, and usability using the TPC-H benchmark. Our results show that the three publicly available serverless data processing frameworks outperform a comparatively sized Apache Spark cluster in terms of performance and cost for ad-hoc queries on cold data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    2
    Citations
    NaN
    KQI
    []