Improving memory hierarchy performance with addressless preload, order-free lsq, and runahead scheduling

2007 
The average memory access latency is determined by three primary factors, cache hit latency, miss rate, and miss penalty. It is well known that cache miss penalty in processor cycles continues to grow. For those memory-bound workloads, a promising alternative is to exploit memory-level parallelism by overlapping multiple memory accesses. We study P-load scheme (P-load stands for Preload), an efficient solution to reduce the cache miss penalty by overlapping cache misses. To reduce cache misses, we also introduce a cache organization with an efficient replacement policy to specifically reduce conflict misses. A recent trend is to fetch and issue multiple instructions from multiple threads at the same time on one processor. This design benefits much from resource sharing among multiple threads. However, contentions of shared resources including caches, instruction issue window and instruction window may hamper the performance improvement from multi-threading schemes. In the third proposed research, we evaluate a technique to solve the resource contention problem in multi-threading environment. Store-load forwarding is a critical aspect of dynamically scheduled execution in modern processors. Conventional processors implement store-load forwarding by buffering the addresses and data values of all in-flight stores in an age-ordered store queue. A load accesses the data cache and in parallel associatively searches the store queue for older stores with matching addresses. Associative structures can be made fast, but often at the cost of substantial additional energy, area, and/or design effort. We introduce a new order-free store queue that decouples the matching of the store/load address and its corresponding age-based priority encoding logic from the original store queue and largely decreases the hardware complexity.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    53
    References
    0
    Citations
    NaN
    KQI
    []