A Measure-Theoretic Foundation for Data Quality

2018 
In this paper, a novel framework for data quality measurement is proposed by adopting a measure-theoretic treatment of the problem. Instead of considering a specific setting in which quality must be assessed, our approach departs more formally from the concept of measurement. The basic assumption of the framework is that the highest possible quality can be described by means of a set of predicates. Quality of data is then measured by evaluating those predicates and by combining their evaluations. This combination is based on a capacity function (i.e., a fuzzy measure) that models for each combination of predicates the capacity with respect to the quality of the data. It is shown that expression of quality on an ordinal scale entails a high degree of interpretation and a compact representation of the measurement function. Within this purely ordinal framework for measurement, it is shown that reasoning about quality beyond the ordinal level naturally originates from the uncertainty about predicate evaluation. It is discussed how the proposed framework is positioned with respect to other approaches with particular attention to aggregation of measurements. The practical usability of the framework is discussed for several well known dimensions of data quality and demonstrated in a use-case study about clinical trials.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    13
    Citations
    NaN
    KQI
    []