Multiple-Loads Deduplication Method Based on Improved Sparse Indexing
2013
To address the problem that the sparse indexing can not deduplicate the backup load based on small files effectively,a min-feature sampling algorithm based on the Broder's extension theorem is proposed.In addition,a deduplication method for multiple backup loads,which is on the basis of the min-feature sampling algorithm,is presented.This method only maintains a very small subset of the full index in the RAM by sampling the backup load,and the cost of disk accesses is amortized by loading the chunk IDs in batches.As a result,the throughput of the method is improved effectively.The experimental results indicate that the compression ratio of the method on the mixed backup loads is 2.04 times of the sparse indexing,and its throughput is almost equal to the sparse indexing.This method is applicable to the high-performance deduplication systems which need to process backup loads of multiple types.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI