CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Model Transfer

2021 
Anomaly detection is indispensable in modern IT infrastructure management. However, the dimension explosion problem of the monitoring data (large-scale machines, many key performance indicators, and frequent monitoring queries) causes a scalability issue to the existing algorithms. We propose a coarse-to-fine model transfer based framework CTF to achieve a scalable and accurate data-center-scale anomaly detection. CTF pre-trains a coarse-grained model, uses the model to extract and compress per-machine features to a distribution, clusters machines according to the distribution, and conducts model transfer to fine-tune per-cluster models for high accuracy. The framework takes advantage of clustering on the per-machine latent representation distribution, reusing the pre-trained model, and partial-layer model fine-tuning to boost the whole training efficiency. We also justify design choices such as the clustering algorithm and distance algorithm to achieve the best accuracy. We prototype CTF and experiment on production data to show its scalability and accuracy. We also release a labeling tool for multivariate time series and a labeled dataset to the research community.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    0
    Citations
    NaN
    KQI
    []