A Reinforcement Learning Scheduling Strategy for Parallel Cloud-Based Workflows

2019 
Scientific experiments can be modeled as Workflows. Such Workflows are usually computing-and data-intensive, demanding the use of High-Performance Computing environments such as clusters, grids, and clouds. This latter offers the advantage of elasticity, which allows for increasing and/or decreasing the number of Virtual Machines (VMs) on demand. Workflows are typically managed using Scientific Workflow Management Systems (SWfMS). Many existing SWfMSs offer support for cloud-based execution. Each SWfMS has its own scheduler that follows a well-defined cost function. However, such cost functions must consider the characteristics of a dynamic environment, such as live migrations and/or performance fluctuations, which are far from trivial to model. This paper proposes a novel scheduling strategy, named ReASSIgN, based on Reinforcement Learning (RL). By relying on an RL technique, one may assume that there is an optimal (or sub-optimal) solution for the scheduling problem, and aims at learning the best scheduling based on previous executions in the absence of a mathematical model of the environment. For this, an extension of a well-known workflow simulator WorkflowSim is proposed to implement an RL strategy for scheduling workflows. Once the scheduling plan is generated, the workflow is executed in the cloud using SciCumulus SWfMS. We conducted a thorough evaluation of the proposed scheduling strategy using a real astronomy workflow.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    5
    Citations
    NaN
    KQI
    []