A Data Structure for Efficient File Deduplication in Cloud Storage

2020 
With the rapid development of Internet, massive data needs to be stored, bringing a significant challenge for cloud storage systems. It is notable that among the data, there are plenty of duplicates file or chunks that can be deduplicated to achieve better spatial efficiency. And many approximate set membership data structures, such as Bloom Filter(BF) and Cuckoo Filter(CF), have been used to accelerate the whole deduplication process. However, errors will inevitably occur as these data structures only store summary information, and the error rate is directly related to the performance bottleneck of the deduplication system. To address these problems, we propose an advanced Cuckoo Filter named Split Position Aware Cuckoo Filter (SPACF) which can noticeably decrease the error rate. We implement the SPACF and compare it with other kinds of CFs and BF, and the experiment results illustrate that the false positive rate of our SPACF is around 50% to Standard Cuckoo Filter and 10% to Counting Bloom Filter.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    0
    Citations
    NaN
    KQI
    []