BTReader: A High-Performance Fault Tolerant Self-Adaptive Broadcast Framework

2020 
In the environment of high-performance computing clusters, it is a common need for a large data file stored on cluster file systems to be broadcast fast to a large set of computing nodes. There are two methods to tackle this problem: MPI broadcast and concurrent read. The advantage of MPI broadcast is high data transfer efficiency, while it suffers from node faults and the complex interference environment. It also brings the side effect of synchronization. Concurrent read is fault tolerant and it can adapt to the complex interference environment. Each process can read asynchronously in the pattern of concurrent read. However, it suffers from the bandwidth bottleneck of the cluster file system. We propose a new broadcast framework called BTReader. BTReader provides the advantage of both MPI broadcast and concurrent read as well as avoids the disadvantage of MPI broadcast and concurrent read. Experimental results from a high-performance computing cluster shows that BTReader improves broadcast efficiency by 104.16% compared with MPI and promises robust fault tolerance ability and well adaptability to the complex interference environment.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    0
    Citations
    NaN
    KQI
    []