Learning from Optimizing Matrix-Matrix Multiplication

Devangi N. Parikh,Jianyu Huang,Margaret E. Myers,Robert A. van de Geijn

Learning from Optimizing Matrix-Matrix Multiplication

2018

Devangi N. Parikh
Jianyu Huang
Margaret E. Myers
Robert A. van de Geijn

We describe a learning process that uses one of the simplest examples, matrix-matrix multiplication, to illustrate issues that underlie parallel high-performance computing. It is accessible at multiple levels: simple enough to use early in a curriculum yet rich enough to benefit a more advanced software developer. A carefully designed and scaffolded set of exercises leads the learner from a naive implementation towards one that extracts parallelism at multiple levels, ranging from instruction level parallelism to multithreaded parallelism via OpenMP to distributed memory parallelism using MPI. The importance of effectively leveraging the memory hierarchy within and across nodes is exposed, as do the GotoBLAS and SUMMA algorithms. These materials will become part of a Massive Open Online Course (MOOC) to be offered in the future.

Keywords:

Computer science
Distributed computing
Matrix multiplication
Multiplication
Distributed memory
Software
Matrix (mathematics)
Ranging
Instruction-level parallelism
Memory hierarchy
Parallel computing
Massive open online course
Kernel (linear algebra)

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations