Optimization of Cloud Workflow Scheduling Based on Balanced Clustering

2017 
Scientific workflow applications consist of many fine-grained computational tasks with dependencies, whose runtime varies widely. When executing these fine-grained tasks in a cloud computing environment, significant scheduling overheads are generated. Task clustering is a key technology to reduce scheduling overhead and optimize process execution time. Unfortunately, the attempts of task clustering often cause the problems of runtime and dependency imbalance. However, the existing task clustering strategies mainly focus on how to avoid the runtime imbalance, but rarely deal with the data dependency between tasks. Without considering the data dependency, task clustering will lead to the poor degree of parallelism during task execution due to the introduced data locality. In order to address the problem of dependency imbalance, we propose Dependency Balance Clustering Algorithm (DBCA), which defines the concept of dependency correlation to measure the similarity between tasks in terms of data dependencies. The tasks with high dependency correlation are clustered together so as to avoid the dependency imbalance. We conducted the experiments on the WorkflowSim platform and compared our method with the existing task clustering method. The results showed that it significantly reduced the execution time of the whole workflow.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    4
    Citations
    NaN
    KQI
    []