Efficient Nearest Neighbor Search by Removing Anti-hub

2021 
The central research question of the nearest neighbor search is how to reduce the memory cost while maintaining its accuracy. Instead of compressing each vector as is done in the existing methods, we propose a way to subsample unnecessary vectors to save memory. We empirically found that such unnecessary vectors have low hubness scores and thus can be easily identified beforehand. Such points are called anti-hubs in the data mining community. By removing anti-hubs, we achieved a memory-efficient search while preserving accuracy. In million-scale experiments, we showed that any vector compression method improves search accuracy by partial replacement with anti-hub removal under the same memory usage. A billion-scale benchmark showed that our data reduction combined with the best search method achieves higher accuracy under the assumption of fixed memory consumption. For example, our method had a much higher recall@100 (0.53) compared with the existing method (0.23) for the same memory consumption (6GB).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []