PM-RAD: An Efficient Restore Algorithm in Deduplication by Pattern Matching

2018 
Deduplication is one of the most effective and efficient techniques to save memory space. It is widely used in data centers and cloud storage systems. After duplicated chunks are identified and removed, some logically consecutive chunks are physically scattered in different containers, which results in the serious fragmentation problem. The fragmentation problem inevitably leads the restore performance degraded severely. In this paper, we propose an efficient recovery algorithm by using pattern matching to boost the restore performance, which is called PM-RAD. It tries to reduce the number of contain reads by finding read patterns within a looking forwarding window. It also can merge scattered chunks and reads at once; thus it reduces the disk access times. Moreover, we optimize the proposed algorithm in two aspects, the separating caches and the cyclic pattern matching, to reduce disk accesses. During the pattern matching, we split cache into the metadata cache responsible for fingerprints and the data cache for storing chunks. The cyclic pattern matching ensures to find much longer patterns in a continuous sliding window. We implement the proposed algorithm and evaluate it by experiment with various data sets. Experimental results show that our algorithm is superior to the state-of-the-art work in terms of the restore performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    0
    Citations
    NaN
    KQI
    []