Efficient Compression Algorithm for Multimedia Data

2020 
In this work, we consider the problem of Cosine Similarity preserving dimensionality reduction (compression) for the sparse binary dataset. [18] suggested a compression algorithm for high dimensional, sparse, binary data for preserving Inner product and Hamming distance. In this work, we show that their proposed algorithm also works well for Cosine Similarity. We present a theoretical analysis of the dimension reduction bound and complement it with rigorous experimentation on real-world datasets. We compare our results with the state-of-the-art for the considered problem – SimHash [8], MinHash [21], Circulant Binary Embedding [25], and Densified one Permutation Hashing [20], and show that our result offers a significant saving in the compression time and the number of random bits required for the compression, and simultaneously provides comparable performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []