An incremental approach for data quality measurement with insufficient information

2018 
Abstract Recently, a fundamental study on measurement of data quality introduced an ordinal-scaled procedure of measurement. Besides the pure ordinal information about the level of quality, numerical information is induced when considering uncertainty involved during measurement. In the case where uncertainty is modelled as probability, this numerical information is ratio-scaled. An essential property of the mentioned approach is that the application of a measure on a large collection of data can be represented efficiently in the sense that (i) the representation has a low storage complexity and (ii) it can be updated incrementally when new data are observed. However, this property only holds when the evaluation of predicates is clear and does not deal with uncertainty. For some dimensions of quality, this assumption is far too strong and uncertainty comes into play almost naturally. In this paper, we investigate how the presence of uncertainty influences the efficiency of a measurement procedure. Hereby, we focus specifically on the case where uncertainty is caused by insufficient information and is thus modelled by means of possibility theory. It is shown that the amount of data that reaches a certain level of quality, can be summarized as a possibility distribution over the set of natural numbers. We investigate an approximation of this distribution that has a controllable loss of information, allows for incremental updates and exhibits a low space complexity.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    45
    References
    2
    Citations
    NaN
    KQI
    []