A Failure Prediction Model for Large Scale Cloud Applications using Deep Learning

2021 
Many cloud service providers face significant challenges in preventing hardware and software failure from occurring. Due to the large scale and heterogeneous nature of cloud computing, cloud services continue to experience failures in their components. A significant proportion of previous studies have focused on the characterization of failed jobs and understanding their behavior, while a few studies have focused on failure prediction, with a focus on increasing the accuracy of failure prediction models. This paper presents the development and implementation of a failure prediction model using a deep learning approach. The proposed model can identify and detect failed tasks early on before they occur. The key feature of the failure prediction model is to improve the performance of cloud applications by reducing the number of failed jobs. In order to investigate the behavior of failure and apply the prediction of failure to the large-scale environment, we used three different traces, namely Google Cluster Trace, Mustang and Trinity. Moreover, we have evaluated the proposed model performance using different evaluation metrics to ensure that the proposed model provides the highest accuracy of predicted values. The proposed model is designed and implemented to achieve high accuracy for failure prediction, regardless of whether the model uses a large or small trace size. The evaluation results show that our proposed model achieved a high precision, recall and f1 score.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    0
    Citations
    NaN
    KQI
    []