Exploiting aggressive memory dependence speculation to simplify the store-load datapath

2011 
Future multi-core and many-core processors are likely to contain one or more high performance out-of-order cores to execute sequential programs and the sequential parts of parallel programs. These out-of-order cores will have to be more energy (and area) efficient than their present-day counterparts. This dissertation focuses on reducing the energy consumption of conventional out-of-order cores while maintaining or improving their performance. Specifically, it focuses on the store-load datapath. The store-load datapath, which consists of the data cache and translation lookaside buffer, load and store queues, and memory dependence predictor, is one of the most energy hungry parts of a traditional out-of-order core, accounting for as much as 25% of total core energy. It is also one of the most performance critical as it determines the execution latency and issue bandwidth of loads. This dissertation proposes to extend the memory dependence predictor and to use it to drive more aggressive forms of load speculation that reduce the energy consumption of the store-load datapath. SISQ (Speculative Indexed Store Queue) replaces associative store queue search with more energy-efficient indexed access to a single store queue entry. RDCA (Reduced Data Cache Access) avoids reading the data cache for loads that can get their values from older in-flight stores or loads via the store and load queues respectively. RDCA reduces data cache access frequency and amplifies data cache access bandwidth. NoSQ (No Store Queue) eliminates the store queue and out-of-order execution of stores. These three techniques are enabled by a memory dependence predictor that achieves high accuracies with reasonable storage and by a lightweight verification mechanism. Experiments show that comparing to a conventional out-of-order core, NoSQ reduces the store-load datapath energy by 17% while outperforming the conventional machine by about 3%. SISQ and RDCA—which can be implemented using changes local to the store-load datapath—not only reduces energy by 26% but also improves performance by almost 3%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []