Making Unstructured Data SPARQL Using Semantic Indexing in Oracle Database

2012 
This paper describes the Semantic Indexing feature introduced in Oracle Database for indexing unstructured text (document) columns. This capability enables searching for concepts (such as people, places, organizations, and events), in addition to words or phrases, with further options for sense disambiguation and term expansion by consulting knowledge captured in OWL/RDF ontologies. The distinguishing aspects of our approach are: 1) Indexing: Instead of building a traditional inverted index of (annotated) token and/or named entity occurrences, we extract the entities, associations, and events present in a text column data and store them as RDF named graphs in the Oracle Database Semantic Store. This base content can be further augmented with knowledge bases and inferred triples (obtained by applying domain-specific ontologies and rule bases). 2) Querying: Instead of relying on proprietary extensions for specifying a search, we allow users to specify a complete SPARQL query pattern that can capture arbitrarily complex relationships between query terms. We have implemented this feature by introducing a sem_contains SQL operator and the associated sem_indextype indexing scheme. The indexing scheme employs an extensible architecture that supports indexing of unstructured text using native as well as third party text extraction tools. The paper presents a model for the semantic index and querying, describes the feature, and outlines its implementation leveraging Oracle's native support for RDF/OWL storage, inferencing, and querying. We also report a study involving use of this feature on a TREC collection of over 130,000 news articles.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    8
    Citations
    NaN
    KQI
    []