Placement Scheduling for Replication in HDFS Based on Probabilistic Approach

2016 
Along with the rapid evolution in Big Data analysis, Apache Hadoop keeps the important role to deliver the high availability on top of computing clusters. Also, to maintain the high throughput access for computation, the Apache Hadoop is equipped with the Hadoop File System HDFS for managing the file operations. Besides, HDFS is ensured the reliability and high availability by using a specific replication mechanism. However, because the workload on each computing node is various, keeping the same replication strategy might result in imbalance. Targeting to solve this drawbacks of HDFS architecture, we proposes an approach to adaptively choose the placement for replicas. To do that, the network status and system utilization can be used to create the individual replication placement strategy for each file. Eventually, the proposed approach can provide the suitable destination for replicas to improve the performance. Subsequently, the availability of the system is enhanced while still keeping the reliability of data storage.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    0
    Citations
    NaN
    KQI
    []