Near Real-Time Service Monitoring Using High-Dimensional Time Series

2015 
We demonstrate a near real-time service monitoring system for detecting and diagnosing issues from high-dimensional time series data. For detection, we have implemented a learning algorithm that constructs a hierarchy of detectors from data. It is scalable, does not require labelled examples of issues for learning, runs in near real-time, and identifles a subset of counter time series as being relevant for a detected issue. For diagnosis, we provide efflcient algorithms as post-detection diagnosis aids to flnd further relevant counter time series at issue times, a SQL-like query language for writing flexible queries that apply these algorithms on the time series data, and a graphical user interface for visualizing the detection and diagnosis results. Our solution has been deployed in production as an end-to-end system for monitoring Microsoft's internal distributed data storage and computing platform consisting of tens of thousands of machines and currently analyses about 12000 counter time series.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    2
    Citations
    NaN
    KQI
    []