Accurate Aggregation Query-Result Estimation and Its Efficient Processing on Distributed Key-Value Store

2019 
We propose four methods for improving the accuracy of aggregation query-result estimation using histograms and/or kernel density estimation and the efficiency of query processing on a distributed key-value store (D-KVS). Recently, aggregation queries have played a key role in analyzing a large amount of multidimensional data generated from sensors, Internet-of-Things devices, etc. A D-KVS is a platform to manage and process such large-scale multidimensional data. However, querying large-scale multidimensional data on a D-KVS sometimes requires a costly data scan owing to its insufficient support for indexes. Since aggregation-query results do not always need to be accurate, our four methods are not only for estimating accurate query results rather than obtaining accurate results by scanning all data, but also improving query-processing performance. We first propose two kernel density estimation-based methods. To further improve query-result estimation accuracy, we combined each of these two methods with a histogram-based scheme so that we can dynamically select an optimal estimation method based on the relationship between a query and the data distribution. We evaluated the efficiency and accuracy of the proposed methods by comparing them with a current method and showed that the proposed methods perform better.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []