MassStore: A low bandwidth, high De-duplication efficiency network backup system

2012 
De-duplication technology has been widely used in disk-based backup system in order to save disk space and reduce backup traffic through internet. But unfortunately De-duplication based backup system often has metadata indexing bottleneck that greatly reduces the backup efficiency and throughput. Existing approaches usually take advantage of backup data flow's similarity or locality to accelerate metadata indexing. In this paper, we design and implement MassStore, a de-duplication based network backup system which use a two-stage locality sensitive hash algorithm, that combines backup data flow's data similarity within data flow's chunk set and the locality between different chunk sets, to accelerate metadata indexing so as to improve de-duplication efficiency. The experimental results using real word data sets shows that our MassStore not only saved the backup storage by average of 88.5%, but also reduced the network bandwidth and RAM usage.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    1
    Citations
    NaN
    KQI
    []