System-level hardware failure prediction using deep learning.

2019 
Disk and memory faults are the leading causes of server breakdown. A proactive solution is to predict such hardware failure at the runtime and then isolate the hardware at risk and backup the data. However, the current model-based predictors are incapable of using the discrete time-series data, such as the values of device attributes, which conveys high-level information of the device behavior. In this paper, we propose a novel deep-learning based prediction scheme for system-level hardware failure prediction. We normalize the distribution of samples' attributes from different vendors to make use of diverse training sets. We propose a temporal Convolution Neural Network based model that is insensitive to the noise in the time dimension. Finally, we design a loss function to train the model with extremely imbalanced samples effectively. Experimental results from an open S.M.A.R.T data set and an industrial data set show the effectiveness of the proposed scheme.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    19
    Citations
    NaN
    KQI
    []