Statistical Learning-Based Prediction of Execution Time of Data-Intensive Program Under Hadoop2.0

2018 
This paper is mainly to predict the running time of data-intensive MapReduce program under Hadoop2.0 environment. Although MapReduce programs are diverse, they can be divided into data-intensive and computationally intensive, depending on the time complexity and the nature of the program. The prediction of computationally intensive programs has always been difficult, and Hadoop has exhibited certain database attributes that are basically data-intensive. Moreover, the relationship between data-intensive programs and the amount of data is more closely related and shows certain statistical characteristics. So the method of statistical learning is applied to predict the execution time. This paper first generates training data and test data according to requirements, and then selects the appropriate features through the analysis of the logs. The prediction was first performed using the KCCA algorithm. However, the deficiencies were found. Then based on the characteristics of the kernel function, a prediction method based on deep learning was proposed, and the result was significant.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []