The Effect of Topology-Aware Process and Thread Placement on Performance and Energy

2013 
Design of modern multiprocessor computer systems has become increasingly complex and renders the performance of scientific parallel applications highly sensitive to process and thread scheduling. In particular, the Non-Uniform Memory Access (NUMA), a frequent architecture solution, demands knowledge of the hardware details as well as skills that are normally beyond the average user in order to minimise memory access penalties and achieve good application performance. This situation is further complicated by the increasing use of modern heterogeneous systems involving both CPUs and accelerators, where process proximity to the accelerator strongly determines performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    6
    Citations
    NaN
    KQI
    []