Para: Harvesting CPU time fragments in Big Data Analytics

2021 
Modern data analytics typically run tasks on statically reserved resources (e.g., CPU and memory), which is prone to over-provision to guarantee the Quality of Service (QoS), leading to a large amount of resource time fragments. As a result, the resource utilization of a data analytics cluster is severely under-utilized. Workload co-location on shared resources has been substantially studied, but they are unaware the sizes of resource time fragments, making them hard to improve the resource utilization and guarantee QoS at the same time. In this paper, we propose Para, an event-driven scheduling mechanism, to harvest the CPU time fragments in co-located big data analytic workloads. Para innovates three techniques: 1) identifying the Idle CPU Time Window (ICTW) associated with each CPU core by capturing the task-switch event; 2) designing a runtime communication mechanism between each task execution of a workload and the underlying resource management system; 3) designing a pull-based scheduler to schedule a workload to run in the ICTW of another workload. We implement Para based on Apache Mesos and Spark. And the experimental results show that Para improves the CPU utilization by 44% and 30% on average relative to the original Mesos and enhanced Mesos under Spark's dynamic mode (MSDM), respectively. Moreover, Para increases the averaged task throughput of Mesos and MSDM by 4.8x and 1.7x, respectively, while guaranteeing the execution time of the primary applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    0
    Citations
    NaN
    KQI
    []