Ontario: Federated Query Processing Against a Semantic Data Lake

Kemele M. Endris,Philipp D. Rohde,Maria-Esther Vidal,Sören Auer

Ontario: Federated Query Processing Against a Semantic Data Lake

2019

Data lakes enable flexible knowledge discovery and reduce the overhead of materialized data integration. Albeit effective for data storage, query execution over data lakes may be expensive, being demanded novel techniques to generate plans able to exploit the main characteristics of data lakes. We devise Ontario, a federated query processing approach tailored for large-scale heterogeneous data. Ontario provides efficient and effective query processing over a federation of heterogeneous data sources in a data lake. Ontario resorts to source descriptions named RDF Molecule Templates, i.e., abstract descriptions of the properties of the entities in a unified schema and their implementation in a data lake. We empirically evaluate the effectiveness of the Ontario optimization techniques over state-of-the-art benchmarks. The observed results suggest that Ontario can effectively select plans composed of subqueries that can be efficiently executed against heterogeneous data sources in a data lake.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations