An improved optimal task selection strategy for hadoop scheduling

2017 
MapReduce is a popular parallel programming model used to solve wide range of Big Data applications in cloud computing environment. Hadoop is an open source implementation MapReduce and widely used by vast amount of users. It provides an abstracted environment for running large scale data intensive applications in a scalable and fault tolerant manner. There are several Hadoop scheduling algorithms are proposed in the literature with various performance goals. In this paper, a new improved optimal task selection scheme is introduced in to assist the scheduler when multiple local tasks are available for a node. To improve the probability of percentage of local tasks launched for a job in future, the task which has least number of replicas of input, individual load of disks attached to the node and maximum expected time to wait for next local node is launched among the available local tasks for a node. The proposed method was evaluated by extensive experiments and it has been observed that the method improves the performance significantly. From the experiments, around 25% of improvements achieved in terms of locality and fairness.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    0
    Citations
    NaN
    KQI
    []