Accelerating Similarity-based Mining Tasks on High-dimensional Data by Processing-in-memory

2021 
Similarity computation is a core subroutine of many mining tasks on multi-dimensional data, which are often massive datasets at high dimensionality. In these mining tasks, the performance bottleneck is caused by the ‘memory wall’ problem as substantial amount of data needs to be transferred from memory to processors. Recent advances in non-volatile memory (NVM) enable processing-in-memory (PIM), which reduces data transfer and thus alleviates the performance bottleneck. Nevertheless, NVM PIM supports specific operations only (e.g., dot-product on non-negative integer vectors) but not arbitrary operations. In this paper, we tackle the above challenge and carefully exploit NVM PIM to accelerate similarity-based mining tasks on multi-dimensional data without compromising the accuracy of results. Experimental results on real datasets show that our proposed method achieves up to 10.5x and 8.5x speedup for state-of-art kNN classification and k-means clustering algorithms, respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []