Discovering Relaxed Functional Dependencies based on Multi-attribute Dominance

2020 
With the advent of big data and data lakes, data are often integrated from multiple sources. Such integrated data are often of poor quality, due to inconsistencies, errors, and so forth. One way to check the quality of data is to infer functional dependencies ( fd s). However, in many modern applications it might be necessary to extract properties and relationships that are not captured through fd s, due to the necessity to admit exceptions, or to consider similarity rather than equality of data values. Relaxed fd s ( rfd s) have been introduced to meet these needs, but their discovery from data adds further complexity to an already complex problem, also due to the necessity of specifying similarity and validity thresholds. We propose Domino , a new discovery algorithm for rfd s that exploits the concept of dominance in order to derive similarity thresholds of attribute values while inferring rfd s. An experimental evaluation on real datasets demonstrates the discovery performance and the effectiveness of the proposed algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    10
    Citations
    NaN
    KQI
    []