Quantifying and Optimizing Data Access Parallelism on Manycores

2018 
Data access parallelism (DAP) indicates how well available hardware resources are utilized by data accesses. This paper investigates four complementary components of data access parallelism in detail: cache-level parallelism (CLP), bank-level parallelism (BLP), network-level parallelism (NLP), and memory controller-level parallelism (MLP). Specifically, we first quantify these four components for a set of 20 multi-threaded benchmark programs, and show that, when executed on a state-of-the-art manycore platform, their original values are quite low compared to the maximum possible values they could take. We next perform a limit study, which indicates that significant performance improvements are possible if the values of these four components of DAP could be maximized. Building upon our observations from this limit study, we then present two practical computation and network access scheduling schemes. Both these schemes make use of profile data, but, while the compiler-based strategy uses fixed priorities of CLP, BLP, NLP, and MLP, the machine learning-based one employs a predictive machine learning model. Our experiments indicate 30.8% and 36.9% performance improvements with the compiler-based and learning-based schemes, respectively. Our results also show that the proposed schemes consistently achieve significant improvements under different values of the major experimental parameters.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    58
    References
    6
    Citations
    NaN
    KQI
    []