A Machine Learning Approach to Foreign Key Discovery.

2009 
the problem of automatically discovering semantic as- sociations between schema elements, namely foreign keys. This problem is important in all applications where data sets need to be integrated that are structured in tables but without explicit foreign key constraints. If such constraints could be recovered automati- cally, querying and integrating such databases would become much easier. Clearly, one may find candidates for foreign key constraints in a given database instance by computing all inclu- sion dependencies (IND) between attributes. However, this set usually contains many false positives due to spurious set inclu- sions. We present a machine learning approach to tackle this problem. We first compute all INDs of a given schema and let each be judged by a binary classification algorithm using a small set of features that can be derived efficiently using standard SQL. We demonstrate the feasibility of this approach using cross- validation with several state-of-the-art classification algorithms. With the J48 algorithm, our approach consistently reaches F- measures above 80% and often close to 100% as evaluated on six different data sets from three different domains.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    50
    Citations
    NaN
    KQI
    []