Context-Aware Learning for Anomaly Detection with Imbalanced Log Data

2020 
Logs are used to record runtime states and significant events for a software system. They are widely used for anomaly detection. Logs produced by most of the real-world systems show clear characteristics of imbalanced data because the number of samples in different classes varies sharply. The distribution of imbalanced data makes the anomaly classifier bias toward the majority class, so it is difficult for a classifier to learn to detect anomalies correctly. Most existing methods for log-based anomaly detection ignore this important problem, so they perform poorly on real-world systems. In this paper, we propose a context-aware method named AllContext for anomaly detection with imbalanced log data. AllContext transforms each log event into a vector, which contains not only the semantic information of each word, but also the semantics of the region where each word is located. Such rich semantic information enables our method to understand the imbalanced log data better and deeper. We conduct extensive experiments on multi-class and binary imbalanced log datasets. The accuracy of the proposed AllContext is more than twofold of that by a baseline state-of-the-art. To evaluate the robustness of the proposed method, we assess AllContext on the imbalanced unseen log data, where all samples in the test dataset do not exist in the training dataset, and the accuracy achieved by AllContext reaches 0.98. The experiments show that the proposed solution achieves accurate results on both imbalanced and unseen log data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    0
    Citations
    NaN
    KQI
    []