CLRC: a New Erasure Code Localization Algorithm for HDFS

2021 
With the continuous development of big data, the increase speed of hardware expansion used for HDFS has been far behind the volume of big data. As a data redundancy strategy, the traditional data replication strategy has been gradually replaced by Erasure Code due to its smaller redundancy rate and storage overhead. However, compared with replicas, Erasure Code needs to read a certain amount of data blocks during the process of data recovery, resulting in a large amount overhead of I/O and network. Based on the RS algorithm, a new CLRC algorithm is proposed to optimize the locality of RS algorithm by grouping RS coded blocks and generating local check blocks. Evaluations show that the algorithm can reduce about 61% bandwidth and I/O consumption during the process of data recovery when a single block is damaged. What's more, the cost of decoding time is only 59% of RS algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    1
    References
    0
    Citations
    NaN
    KQI
    []