Bounded Approximate Query Processing

2018 
OLAP is a core functionality in database systems and the performance is crucial to enable on-time decisions. However, OLAP queries are rather time consuming, especially on large datasets, and traditional exact solutions usually cannot meet the high-performance requirement. Recently, approximate query processing (AQP) has been proposed to enable approximate OLAP. However, existing AQP methods have some limitations. First, they may involve unacceptable errors on skewed data (e.g., long-tail distribution). Second, they require to store large amount of data and have no significant performance improvement. Third, they only support a small subset of SQL aggregation queries. To overcome these limitations, we propose a bounded approximate query processing framework ${\mathtt {BAQ}}$ BAQ . Given a predefined error bound and a set of queries, ${\mathtt {BAQ}}$ BAQ judiciously selects high-quality samples from the data to generate a unified synopsis offline, and then uses the synopsis to answer online queries. Compared with existing methods, ${\mathtt {BAQ}}$ BAQ has the following salient features. (1) ${\mathtt {BAQ}}$ BAQ does not need to generate a synopsis for each query while it only generates a unified synopsis, and thus ${\mathtt {BAQ}}$ BAQ has much smaller synopsis. (2) ${\mathtt {BAQ}}$ BAQ achieves much smaller error than existing studies. Specifically, ${\mathtt {BAQ}}$ BAQ can provide deterministic approximate results (i.e., the estimated query results must be within the error bound with 100 percent confidence) for SQL aggregation queries that do not contain selection conditions on numerical columns. For queries with selection conditions on numerical columns, we propose effective grouping-based techniques and the estimated results are also within the error bound in practice. Experimental results on both real and synthetic datasets show that ${\mathtt {BAQ}}$ BAQ significantly outperforms state-of-the-art approaches. For example, on a Microsoft production dataset (a real dataset with synthetic queries), ${\mathtt {BAQ}}$ BAQ has 10-100× improvement on synopsis size and 10-100× improvement on the error compared with state-of-the-art algorithms.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    38
    References
    7
    Citations
    NaN
    KQI
    []