Exploratory Data Analysis in SAP IQ Using Query-Time Sampling

2021 
As businesses continue to consume and produce ever-growing volumes of data, exploratory data analysis (EDA) is becoming an integral part of everyday operations. While online analytical processing (OLAP) systems in general – and column-oriented relational database management systems (RDBMS) in particular – are equipped with powerful tools to plough through petabytes of data, analytical queries may take seconds to execute, which is not always desirable in exploratory data analysis. Data scientists often need tools for fast visualization of data, and they are interested in identifying subsets of data that need further drilling-down before running computationally expensive analytical functions. In this paper, we describe our early work on extending SAP IQ (a disk-based columnar RDBMS) to support approximate query processing for exploratory data analysis using a technique known as query-time sampling. Specifically, we introduce two classes of novel samplers: (i) a stratified sampler with randomized row access to address the early-row bias problem in sampling, and (ii) hash-based equi-join samplers that are outlier-aware. We demonstrate how SAP IQ’s polymorphic table function (PTF) technology can be utilized to implement these samplers as new query plan operators.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []