An Asynchronous Algorithm for Optimizing the Communication Performance

2019 
Iterative applications such as stencil problem are the most important and time-consuming computing core for numerical simulations and scientific applications. Iterative applications often require high frequency iterations. The dependency computations between steps lead to frequent data exchanges. Existing large-scale clusters tend to have long memory access latency and network latency. Long latency results in higher communication or data movement costs than computating costs. Therefore, the main performance bottleneck of stencil algorithm on large-scale clusters is how to improve the iterative frequency. This paper proposes a new asynchronous algorithm which could reduce the number of data exchanges among processes of high-frequency iterative applications. The asynchronous algorithm consists of multiple rounds and communication only occurs after each round. The input data comes from the asynchronous communication, resulting in inaccurate but speculative computations. The speculative computations could improve accuracy and speed up convergence. In addition, the speculative computations can automatically correct the precision which is caused by asynchronous communication. The proposed algorithm has been tested on a stencil-based parallel computation and compared with a BSP implementation of the same application. The asynchronous algorithm can effectively reduce the number of data exchanges at the expense of higher computation overhead and larger message size, performance can be improved up to 2.8x.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    0
    Citations
    NaN
    KQI
    []