A relational approach to incrementally extracting and querying structure in unstructured data

2007 
There is a growing consensus that it is desirable to query over the structure implicit in unstructured documents, and that ideally this capability should be provided incrementally. However, there is no consensus about what kind of system should be used to support this kind of incremental capability. We explore using a relational system as the basis for a workbench for extracting and querying structure from unstructured data. As a proof of concept, we applied our relational approach to support structured queries over Wikipedia. We show that the data set is always available for some form of querying, and that as it is processed, users can pose a richer set of structured queries. We also provide examples of how we can incrementally evolve our understanding of the data in the context of the relational workbench.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    53
    Citations
    NaN
    KQI
    []