Merged Requests for Better Performance and Productivity in Multithreaded OpenSHMEM

2017 
A merged request is a handle representing a group of Remote Memory Access (RMA), Atomic or Collective operations. The merged request can be created either by combining multiple outstanding merged request handles or using the same merged request handle for additional operations. We show that introducing such simple yet powerful semantics in OpenSHMEM provides many productivity and performance advantages. In this paper, we first introduce the interfaces and semantics for creating and using merged request handles. Then, we demonstrate with a merge request that we can achieve better performance characteristics in multithreaded OpenSHMEM application. Particularly, we show one can achieve higher message rate, a higher bandwidth for smaller message, and better computation-communication overlap. Further, we use merged request to realize multithreaded collectives, where multiple threads co-operate to complete the collective operation. Our experimental results show that in a multithreaded OpenSHMEM program, the merged request based RMA operations achieve over 100 Million Messages Per Second (MMPS). It achieves over 10 MMPS compared to 4.5 MMPS with default RMA operations in a single threaded environment. Also, we achieve higher bandwidth for smaller message sizes, close to 100% overlap, and reduce the latency by 60%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    1
    Citations
    NaN
    KQI
    []