Making Unstructured Data SPARQL Using Semantic Indexing in Oracle Database

Souripriya Das,Seema Sundara,Matthew Perry,Jagannathan Srinivasan,Jayanta Banerjee,Aravind Yalamanchi

Making Unstructured Data SPARQL Using Semantic Indexing in Oracle Database

2012

This paper describes the Semantic Indexing feature introduced in Oracle Database for indexing unstructured text (document) columns. This capability enables searching for concepts (such as people, places, organizations, and events), in addition to words or phrases, with further options for sense disambiguation and term expansion by consulting knowledge captured in OWL/RDF ontologies. The distinguishing aspects of our approach are: 1) Indexing: Instead of building a traditional inverted index of (annotated) token and/or named entity occurrences, we extract the entities, associations, and events present in a text column data and store them as RDF named graphs in the Oracle Database Semantic Store. This base content can be further augmented with knowledge bases and inferred triples (obtained by applying domain-specific ontologies and rule bases). 2) Querying: Instead of relying on proprietary extensions for specifying a search, we allow users to specify a complete SPARQL query pattern that can capture arbitrarily complex relationships between query terms. We have implemented this feature by introducing a sem_contains SQL operator and the associated sem_indextype indexing scheme. The indexing scheme employs an extensible architecture that supports indexing of unstructured text using native as well as third party text extraction tools. The paper presents a model for the semantic index and querying, describes the feature, and outlines its implementation leveraging Oracle's native support for RDF/OWL storage, inferencing, and querying. We also report a study involving use of this feature on a TREC collection of over 130,000 news articles.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations