Mining Data Quality Rules for Data Migrations: A Case Study on Material Master Data

2021 
Master data sets are an important asset for organizations and their quality must be high to ensure organizational success. At the same time, data migrations are complex projects and they often result in impaired data sets of lower quality. In particular, data quality issues that involve multiple attributes are difficult to identify and can only be resolved with manual data quality checks. In this paper, we are investigating a real-world migration of material master data. Our goal is to ensure data quality by mining the target data set for data quality rules. In a data migration, incoming data sets must comply with these rules to be migrated. For generating data quality rules, we used a SVM for rules at a schema level and Association Rule Learning for rules at the instance level. We found that both methods produce valuable rules and are suitable for ensuring quality in data migrations. As an ensemble, the two methods are adequate to manage common real-world data characteristics such as sparsity or mixed values.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    1
    Citations
    NaN
    KQI
    []