Anomaly Detection and Failure Root Cause Analysis in (Micro)Service-Based Cloud Applications: A Survey.

2021 
The momentum gained by microservices and cloud-native software architecture pushed nowadays enterprise IT towards multi-service applications. The proliferation of services and service interactions within applications, often consisting of hundreds of interacting services, makes it harder to detect failures and to identify their possible root causes, which is on the other hand crucial to promptly recover and fix applications. Various techniques have been proposed to promptly detect failures based on their symptoms, viz., observing anomalous behaviour in one or more application services, as well as to analyse logs or monitored performance of such services to determine the possible root causes for observed anomalies. The objective of this survey is to provide a structured overview and a qualitative analysis of currently available techniques for anomaly detection and root cause analysis in modern multi-service applications. Some open challenges and research directions stemming out from the analysis are also discussed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    80
    References
    1
    Citations
    NaN
    KQI
    []