Waterwheel: Realtime Indexing and Temporal Range Query Processing over Massive Data Streams

2018 
Massive data streams from sensors in Internet of Things (IoT) and smart devices with Global Positioning System (GPS) are now flooding to database systems for further processing and analysis. The capability of real-time retrieval from both fresh and historical data turns out to be the key enabler to the real world applications in smart manufacturing and smart city utilizing these data streams. In this paper, we present a simple and effective distributed solution to achieve millions of tuple insertions per second and ad-hoc temporal range query processing in milliseconds. To this end, we propose a new data partitioning scheme that takes advantage of the workload characteristics and avoids expensive global data merging. Furthermore, to resolve the throughput bottleneck, we adopt a template-based index method to skip unnecessary index structure adjustments over the relatively stable distribution of incoming tuples. To parallelize data insertion and query processing, we propose an efficient dispatching mechanism and effective load balancing strategies to fully utilize computational resources in a workload-aware manner. On both synthetic and real workloads, our solution consistently outperforms state-of-the-art open-source systems by at least an order of magnitude.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    5
    Citations
    NaN
    KQI
    []