Feature Selection for Improving Failure Detection in Hard Disk Drives Using a Genetic Algorithm and Significance Scores

2020 
Hard disk drives (HDD) are used for data storage in personal computing platforms as well as commercial datacenters. An abrupt failure of these devices may result in an irreversible loss of critical data. Most HDD use self-monitoring, analysis, and reporting technology (SMART), and record different performance parameters to assess their own health. However, not all SMART attributes are effective at detecting a failing HDD. In this paper, a two-tier approach is presented to select the most effective precursors for a failing HDD. In the first tier, a genetic algorithm (GA) is used to select a subset of SMART attributes that lead to easily distinguishable and well clustered feature vectors in the selected subset. The GA finds the optimal feature subset by evaluating only combinations of SMART attributes, while ignoring their individual fitness. A second tier is proposed to filter the features selected using the GA by evaluating each feature independently, using a significance score that measures the statistical contribution of a feature towards disk failures. The resultant subset of selected SMART attributes is used to train a generative classifier, the naive Bayes classifier. The proposed method is tested on a SMART dataset from a commercial datacenter, and the results are compared with state-of-the-art methods, indicating that the proposed method has a better failure detection rate and a reasonable false alarm rate. It uses fewer SMART attributes, which reduces the required training time for the classifier and does not require tuning any parameters or thresholds.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    1
    Citations
    NaN
    KQI
    []