A self-adaptive scheduling algorithm for reduce start time

2015 
MapReduce is by far one of the most successful realizations of large-scale data-intensive cloud computing platforms. When to start the reduce tasks is one of the key problems to advance the MapReduce performance. The existing implementations may result in a block of reduce tasks. When the output of map tasks become large, the performance of a MapReduce scheduling algorithm will be influenced seriously. Through analysis for the current MapReduce scheduling mechanism, this paper illustrates the reasons of system slot resources waste, which results in the reduce tasks waiting around, and proposes an optimal reduce scheduling policy called SARS (Self Adaptive Reduce Scheduling) for reduce tasks' start times in the Hadoop platform. It can decide the start time point of each reduce task dynamically according to each job context, including the task completion time and the size of map output. Through estimating job completion time, reduce completion time, and system average response time, the experimental results illustrate that, when comparing with other algorithms, the reduce completion time is decreased sharply. It is also proved that the average response time is decreased by 11% to 29%, when the SARS algorithm is applied to the traditional job scheduling algorithms FIFO, FairScheduler, and CapacityScheduler. This paper illustrates the reasons of the system slots waster for reduces tasks waiting around.The model can determine the start time of reduce tasks, dynamically according to job context.As an optimal scheduling algorithm, SARS can decrease the reduce completion time for jobs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    52
    Citations
    NaN
    KQI
    []