Effective Exploration of Thread Throttling and Thread/Page Mapping on NUMA Systems

2020 
NUMA systems have become commonly used in HPC. However, to fully take advantage of these systems, the right thread-to-core allocation and page placement are essential. On top of that, considering that many parallel applications have limited scalability, applying thread throttling (i.e., artificially reducing the number of active threads) most of times will further improve energy and/or performance. Because it involves many variables, most of the previous research has not considered applying the aforementioned approaches altogether. Therefore, not using a smart method to converge to a satisfactory solution precludes an efficient design space exploration, not unlocking NUMA systems’ full potential. Considering that, we propose a new methodology, which involves evaluating the number of threads, thread-to-core allocation, and page placement independently (i.e., we apply a local search to each one). By evaluating and using the right sequence of local searches, we significantly reduce the design space exploration and still maintain the quality of results (w.r.t to an exhaustive global search), making it applicable to real-world scenarios. Our experiments using several parallel applications running on two different NUMA systems show that our methodology speeds up the search by to $9 \times$, on average, reducing the search space in at least 70% and up to 86%, while finding optimal or near-optimal solutions.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    0
    Citations
    NaN
    KQI
    []