A Near-Exact Defragmentation Scheme to Improve Restore Performance for Cloud Backup Systems

2014 
Cloud backup systems leverage data deduplication to remove duplicate chunks that are shared by many versions. The duplicate chunks are replaced with the references to old chunks via deduplication, instead of being uploaded to the cloud. The consecutive chunks in backup streams are actually stored dispersedly in several segments (the storage unit in the cloud), which results in fragmentation for restore. The segments that are referred will be downloaded from the cloud when the users want to restore the chunks of the latest version, and some chunks that are not referred will be downloaded together, thus jeopardizing the restore performance. In order to address this problem, we propose a near-exact defragmentation scheme, called NED, for deduplication based cloud backups. The idea behind NED is to compute the ratio of the length of chunks referred by current data stream in a segment to the segment length. If the ratio is smaller than a threshold, the chunks in the data stream that refer to the segment will be labeled as fragments and written to new segments. By efficiently identifying fragmented chunks, NED significantly reduces the number of segments for restore with slight decrease of deduplication ratio. Experiment results based on real-world datasets demonstrate that NED effectively improves the restore performance by 6%~105% at the cost of 0.1%~6.5% decrease in terms of deduplication ratio.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    8
    Citations
    NaN
    KQI
    []