language-icon Old Web
English
Sign In

Distributed Stream KNN Join

2021 
kNN join over data streams is an important operation for location-aware systems, which correlates events from different sources based on their occurrence locations. Combining the complexity of kNN join and the dynamicity of data streams, kNN join in streaming environments is a computationally intensive operator, and its performance can be greatly improved by utilizing the computational capabilities of modern non-uniform memory access (NUMA) computing platforms. However, the conventional approaches to kNN join for prestored datasets do not work efficiently with the kind of highly dynamic data found in streaming environments. Therefore, in this paper, we introduce an adaptive scalable stream kNN join, named ADS-kNN, to address the challenges of performing the kNN join operation on highly dynamic data. We propose a multistage kNN execution plan that enables high-performance kNN queries in distributed settings by overlapping the computation and communication stages. Moreover, we propose an adaptive data partitioning scheme that dynamically adjusts the load among the operators according to the changes in the input values. Combining these two techniques, ADS-kNN provides a scalable and adaptive kNN join operator for data streams. Our experiments using a 56-core system show that ADS-kNN achieves a maximum throughput that is 21 times higher than that of a single-threaded approach.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    0
    Citations
    NaN
    KQI
    []