Scheduling big data workflows in the cloud under budget constraints

2016 
Big data is fast becoming a ubiquitous term in both academia and industry and there is a strong need for new data-centric workflow tools and techniques to process and analyze large-scale complex datasets that are growing exponentially. On the other hand, the unbound resource leasing capability foreseen in the cloud facilitates data scientists to wring actionable insights from the data in a time and cost efficient manner. In the data-centric workflow environment, scheduling data processing tasks onto appropriate resources are often driven by the constraints provided by the users. Enforcing a constraint while executing the workflow in the cloud adds a new optimization challenge on how to meet the objective while satisfying the given constraint. In this paper, we propose a new Big dAta woRkflow schEduler uNder budgeT constraint known as BARENTS that supports high-performance workflow scheduling in a heterogeneous cloud computing environment with a single objective to minimize the workflow makespan under a provided budget constraint. Our case study and experiments show the competitive advantages of our proposed scheduler. The proposed BARENTS scheduler is implemented in a new release of DATA VIEW, one of the most usable big data workflow systems in the community.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    8
    Citations
    NaN
    KQI
    []