A hierarchical multi-objective task scheduling approach for fast big data processing

Zahra Jalalian,Mohsen Sharifi

A hierarchical multi-objective task scheduling approach for fast big data processing

2021

Due to the rapid growth of production and dissemination of big data from various sources, the speed of data processing must inevitably increase. In distributed big data processing systems such as cloud computing, the task scheduler is responsible for mapping a large set of various tasks to a set of possibly heterogeneous computing nodes in a way to raise resource efficiency and data locality and reduce makespan. Scheduling strategies that try to achieve these goals in one pass have lower performance than multi-pass strategies. To achieve higher performance, we propose MOTS (a hierarchical multi-objective task scheduling scheme) by first clustering tasks using the K-means algorithm alongside a load balancing equation to increase resource efficiency and then optimizing clusters to reduce makespan using evolutionary algorithms. The latter is achieved by using the state of physical machines and sending related consecutive tasks to a physical machine to eliminate data transfer. We have simulated and tested our scheme in Cloudsim. Our experiments show reduction of approximately 10% makespan and 4% higher CPU efficiency compared to Mai’s reinforcement learning approach and Bugerya’s parallel implementation method. The cost of data transfer between consecutive tasks is also decreased by 10% compared to Bugerya’s methods. With respect to the results and the fact that our proposed task scheduling scheme is inspired by the iHadoop method for parallel implementation, it is suitable for use in distributed big data processing systems. Information about previous executions of tasks and current status of computing nodes is highly influential in efficient mapping of tasks to computing nodes. Predictions of future resource needs of tasks and available capacities of computing nodes can complement the historical information in the way of finding a more near-to-optimal mapping, resulting in faster data processing. This issue and evaluation of our proposed scheme using real data will be pursued in the future.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations