Ontario: Federated Query Processing Against a Semantic Data Lake

2019 
Data lakes enable flexible knowledge discovery and reduce the overhead of materialized data integration. Albeit effective for data storage, query execution over data lakes may be expensive, being demanded novel techniques to generate plans able to exploit the main characteristics of data lakes. We devise Ontario, a federated query processing approach tailored for large-scale heterogeneous data. Ontario provides efficient and effective query processing over a federation of heterogeneous data sources in a data lake. Ontario resorts to source descriptions named RDF Molecule Templates, i.e., abstract descriptions of the properties of the entities in a unified schema and their implementation in a data lake. We empirically evaluate the effectiveness of the Ontario optimization techniques over state-of-the-art benchmarks. The observed results suggest that Ontario can effectively select plans composed of subqueries that can be efficiently executed against heterogeneous data sources in a data lake.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    14
    Citations
    NaN
    KQI
    []