SplitServe: Efficiently Splitting Apache Spark Jobs Across FaaS and IaaS

2020 
Due to their lower startup latencies and finer-grain pricing than virtual machines (VMs), Amazon Lambdas and other cloud functions (CFs) have been identified as ideal candidates for handling unexpected spikes in simple, stateless workloads. However, it is not immediately clear if CFs would be similarly effective in autoscaling complex workloads involving significant state transfer across distributed application components. We have found that, through careful design, currently available CFs can indeed be useful even for complex workloads. To demonstrate this, we design and implement SplitServe, an enhancement of Apache Spark. If not enough executors on existing VMs are available for a newly arriving latency-sensitive job, SplitServe is able to use CFs to quickly bridge this shortfall in VMs, so avoiding the startup latencies of newly requested VMs. If desirable in terms of performance or cost, when newly requested VMs, or executors on existing VMs, do become available, SplitServe is able to move ongoing work from CFs to them. Our experimental evaluation of SplitServe using four different workloads (either on a mixture of VM-based executors and CFs or just CFs) shows that it improves execution time by up to (a) 55% for workloads with small to modest amount of shuffling, and (b) 31% in workloads with large amounts of shuffling, when compared to only VM-based autoscaling.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    1
    Citations
    NaN
    KQI
    []