Concepts, strategies, and challenges of data deduplication

2021 
Abstract Data deduplication (DD) approaches are used to eliminate redundant data from the existing data. It means that DD helps for the effective utilization of storage space and then reduces accessing time of data. It is regarded as a propitious approach to manage duplicate data. DD originally permits the uploading of exclusive data copy to the storage, whereas the succeeding copies (duplicates) are rendered with pointers to the genuine amassed duplicates. Nevertheless, numerous DD methods were posited and utilized; no particular best solution was developed to manage all sorts of redundancies. Every DD approach was created with dissimilar designs in addition to DD time-centered on performance together with overhead. Presume that the datasets have numerous duplicates for a file. In this scenario, the DD relates files devoid of observing at their content for a quick running time. Nevertheless, for similar files (not identical), DD approaches look within the files for verifying which portion of the file contents are existent (same) in the formerly saved data for effectually saving the storage space. Here various prevailing DD approaches are organized centered on granularity, deduplication’s location, and deduplication time. This work commences by clarifying the effective detection of redundancy utilizing hashing (chunk index) and bloom filters. After that, it illustrates how every DD approach functions.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    0
    Citations
    NaN
    KQI
    []