A Data-Aware Scheduling Strategy for Executing Large-Scale Distributed Workflows

2021 
Task scheduling is a crucial key component for the efficient execution of data-intensive applications on distributed environments, by which many machines must be coordinated to reduce execution times and bandwidth consumption. This paper presents ADAGE, a data-aware scheduler designed to efficiently execute data-intensive workflows in large-scale computers. The proposed scheduler is based on three key features: $i$ ) critical path analysis , for discovering the critical tasks of a workflow and reducing data transferring between nodes; $ii$ ) work giving , a new dynamic planning strategy for migrating tasks from overloaded to unloaded nodes; and $iii$ ) task replication , which executes task replicas on different nodes for improving both execution time and fault tolerance. Experiments performed on a distributed computing environment composed of up to 1,024 processing nodes show that ADAGE achieves better performances than existing scheduling systems, obtaining an average reduction of up to 66% in execution time.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    0
    Citations
    NaN
    KQI
    []