An Optimized Strategy for Small Files Storing and Accessing in HDFS

2017 
Nowadays, the most popular way of data storage is distributed storage and the most widespread cloud storage platform is HDFS. It successfully used by many notable companies since its excellent capability. Unfortunately, the original design of HDFS was to handle large files, when dealing with enormous quantity of small files, the situation is not very optimistic. To solve this problem, an optimized algorithm is introduced, it considers the size of small files when merging files into combine file, and generates map record for each small file. Meanwhile, we apply prefetching and caching mechanism to enhance the access efficiency. The experimental results show that the optimized strategy reduces the NameNode's memory and access time consumption, thus it can achieve better performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    7
    Citations
    NaN
    KQI
    []