A Communication-Efficient Multi-Chip Design for Range-Limited Molecular Dynamics

2020 
Molecular Dynamics simulation (MD) has been thought a promising FPGA application for many years, especially with clusters of tightly coupled FPGAs where the large-scale, general-purpose, low-latency interconnects provide a communication capability not available with any other COTS computing technology. Parallelization of one part of the MD computation, the 3D FFT, has been studied previously; for likely FPGA cluster sizes, however, the range-limited computation (RL) is more challenging. The motivation here is that the direct replication of the single-chip design suffers from inefficient inter-board bandwidth usage. In particular, although communication in RL is local, likely bandwidth limitations will constrain performance unless great care is taken in design and analysis. In the multi-chip scenario, inter-board bandwidth is the critical constraint and the main target of this work. We analyze it with respect to three application restructurings: workload distribution, data forwarding pattern, and data locality. We describe how bandwidth can be balanced by configuring workload distribution and data forwarding paths with respect to the number of onboard transceiver ports. We also show that, by manipulating data locality, the multi-chip design is efficiently migrated from the single-chip design, and the total bandwidth required can be configured to satisfy the bandwidth limit. In the multi-chip scenario, inter-board bandwidth is the critical constraint and the main target of this work. We analyze it with respect to three application restructurings: workload distribution, data forwarding pattern, and data locality. We describe how bandwidth can be balanced by configuring workload distribution and data forwarding paths with respect to the number of onboard transceiver ports. We also show that, by manipulating data locality, the multi-chip design is efficiently migrated from the single-chip design, and the total bandwidth required can be configured to satisfy the bandwidth limit.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    1
    Citations
    NaN
    KQI
    []