Estimating Global Completeness of Event Logs: A Comparative Study

2018 
Event logs are the basis of process mining techniques and tools that extract process behavior information for better understanding and optimization of business processes. While it has been widely realized that the degree of completeness of event logs may largely determine the effectiveness of these techniques, how to estimate the completeness of event logs has not yet been fully addressed. This is mainly because ground-truth process models are usually unknown. To attack this problem, we pay a closer look to several concepts and implicit assumptions in the log completeness estimation problem and characterize it as a special case of the species estimation problem in the field of statistics. Although species estimation is still an open problem, a number of statistic models and techniques with approximate solutions have been available. To investigate the relevance of these methods for event log completeness estimation, we have designed and conducted a wide scope of empirical study and quantitative experiments on both real-world and synthesized event logs to compare the performance of these methods. In addition, the completeness estimation of several important and widely used real-world events logs are reported for the first time together with some best practice experience learned through this research.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    48
    References
    4
    Citations
    NaN
    KQI
    []