Process Mapping onto Complex Architectures and Partitions Thereof

2017 
Data locality is a critical issue in order to achieve performance on today's high-end parallel machines. As these machines are highly non-uniform, distributing computations across their processing elements does not only require to minimize inter-process communication, but also to favor local communication over distant communication. For that purpose, static and/or dynamic (re)mapping tools have been devised, that allow one to map process graphs onto architecture graphs describing the topology and architectural features of such machines. However, in practice, the real problem to solve is to map a process graph onto possibly disconnected parts of a non-uniform parallel machine, such as a set of nodes provided by some batch scheduler. This paper presents a set of algorithms to perform this task in an efficient way. Efficiency is achieved thanks to a multilevel description of target architectures. All the presented algorithms have been implemented in the \scotch\ static mapping software. Experiments evidence the quality of the produced mappings.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []