A Method for Modeling Data Anomalies in Practice

2021 
As technology has allowed us to collect large amounts of industrial data, it has become critical to analyze and understand the data collected, in particular to find data anomalies. Anomaly analysis allows a company to detect, analyze and understand anomalous or unusual data patterns. This is an important activity to understand, for example, deviations in service which may indicate potential problems, or differing customer behavior which may reveal new business opportunities. Much previous work has focused on anomaly detection, in particular using machine learning. Such approaches allow clustering of data patterns by common attributes, and, although useful, clusters often do not correspond to the root causes of anomalies, meaning that more manual analysis is needed. In this paper we report on a design science study with two different teams, in a partner company which focuses on modeling and understanding the attributes and root causes of data anomalies. After iteration, for each team, we have created general and anomaly-specific UML class diagrams and goal models to capture anomaly details. We use our experiences to create an example taxonomy, classifying anomalies by their root causes, and to create a general method for modeling and understanding data anomalies. This work paves the way for a better understanding of anomalies and their root causes, leading towards creating a training set which may be used for machine learning approaches.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []