Designing Efficient MPI and UPC Runtime for Multicore Clusters with InfiniBand, Accelerators and Co-Processors

Miao Luo

Designing Efficient MPI and UPC Runtime for Multicore Clusters with InfiniBand, Accelerators and Co-Processors

2013

Miao Luo

High End Computing (HEC) has been growing dramatically over the past decades. The emerging multi-core systems, heterogeneous architectures and interconnects introduce various challenges and opportunities to improve the performance of communication middlewares and applications. The increasing number of processor cores and Co-Processors results in not only heavy contention on communication resources, but also much more complicated levels of communication patterns. Message Passing Interface (MPI) is the dominant parallel programming language for HPC application area in the past two decades. MPI has been very successful in implementing regular, iterative parallel algorithms with well defined communication pattern. Instead, the Partitioned Global Address Space (PGAS) programming model provides a flexible way for these applications to express parallelism. Different variations and combinations of these programming languages present new challenges in designing optimized programming model runtimes, in terms of efficient sharing of networking resources and efficient work-stealing techniques for computation load balancing across threads/processes, etc. Middlewares play a key role in delivering the benefits of new hardware techniques to support the new requirement from applications and programming models. This dissertation aims to study several critical contention problems of existing runtimes, which supports popular parallel programming models (MPI and UPC) on emerging multi-core/many-core

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations