Improving Spark performance with MPTE in heterogeneous environments

2016 
Spark has become the first choice of distributed computing framework for big data processing. The biggest highlight is the use of in-memory computations on large clusters, which is suitable for iterative computing and interactive computing. However, the straggler machines can seriously affect their performance. The current approach of Spark is speculative execution which selects the slow tasks and resubmit them, but there are two deficiencies: Firstly, it directly uses the median time to judge whether the task is abnormal, this may be misleading in reality; Secondly, the backup tasks are directly added to the task queue without taking into account the presence of straggler machines. These deficiencies will further extend the execution time of a job. Therefore, we design a improved speculative strategy, Multiple Phases Time Estimation (MPTE), which greatly reduces the impact of straggler machines. In MPTE, we use the remaining time estimated based on multiple phases to select slow tasks, and we improve the task scheduler for backup tasks scheduling. Experiment results show that MPTE can improve the accuracy of determining if should run a speculative copy for a task by about 20% compared to Spark native scheduler.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    14
    Citations
    NaN
    KQI
    []