Apollo: A Dataset Profiling and Operator Modeling System

2019 
The rapidly increasing amount of available data has created invaluable business opportunities but also new challenges. The focus on content-driven analytics is shifting attention from optimizing operators and systems to handle massive data sizes, to intelligent selection of those datasets that maximize the business competitive advantage. To date, there exists no efficient method to quantify the impact of numerous available datasets over different analytics tasks - a thorough execution over every input would be prohibitively expensive. In this demonstration, we present Apollo, a data profiling and operator modeling system that tackles this challenge. Our system quantifies dataset similarities and projects them into a low-dimensional space. Operator outputs are then estimated over the entire dataset, utilizing similarity information with Machine Learning and a small sample of actual executions. During the demo, attendees will be able to model and visualize multiple analytics operators over datasets from the domains of machine learning and graph analytics.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    1
    Citations
    NaN
    KQI
    []