Measuring the Behavioral Quality of Log Sampling

2019 
Process mining combines data mining with process analysis, e.g. to discover process models from event logs. Practice shows that event logs grow very fast. Consequently, they quickly become too large to analyze with current tools. Given the exploratory nature of many process mining algorithms, this can be problematic, as in many cases algorithms are used frequently to optimize and analyze the influence of parameters. One solution is reducing the data by sampling the event log. Many sampling approaches exist, yet the quality of these approaches is unknown. In this paper, we study the behavioral quality of event log sampling, and introduce measures to quantify this behavioral quality. The approach has been implemented in the tool ProM. Experiments show that sampling very quickly introduces under and oversampled behavior in the event log, which can be problematic for frequency-based algorithms.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    8
    Citations
    NaN
    KQI
    []