logo
    PAC-PLRU: A Cache Replacement Policy to Salvage Discarded Predictions from Hardware Prefetchers
    9
    Citation
    24
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Cache replacement policy plays an important role in guaranteeing the availability of cache blocks, reducing miss rates, and improving applications' overall performance. However, recent research efforts on improving replacement policies require either significant additional hardware or major modifications to the organization of the existing cache. In this study, we propose the PAC-PLRU cache replacement policy. PAC-PLRU not only utilizes but also judiciously salvages the prediction information discarded from a widely-adopted stride prefetcher. The main idea behind PAC-PLRU is utilizing the prediction results generated by the existing stride prefetcher and preventing these predicted cache blocks from being replaced in the near future. Experimental results show that leveraging the PAC-PLRU with a stride prefetcher reduces the average L2 cache miss rate by 91% over a baseline system with only PLRU policy, and by 22% over a system using PLRU with an unconnected stride prefetcher. Most importantly, PAC-PLRU only requires minor modifications to existing cache architecture to get these benefits. The proposed PAC-PLRU policy is promising in fostering the connection between prefetching and replacement policies, and have a lasting impact on improving the overall cache performance.
    Keywords:
    STRIDE
    Cache invalidation
    Smart Cache
    Cache pollution
    As multi-core trends are becoming dominant, cache structures are being sophisticated and complicated. Also, the bigger shared level-2 (L2) caches are demanded for higher cache performance. However, the big cache size is directly related to the area and power consumption. Designing a cache memory, one of the easiest ways to increase the performance is doubling the cache size. In mobile processors, however, simple increase of the cache size may significantly affect its chip area and power. To address this issue, in this paper, we propose the hy-way cache (hybrid-way cache) which is a composite cache mechanism to maximize cache performance within a given cache size. This mechanism can improve cache performance without increasing cache size and set associativity by emphasizing the utilization of primary way(s) and pseudo-associativity. Based on our experiments with the sampled SPEC CPU2000 workload, the proposed cache mechanism shows the remarkable reduction in cache misses with the penalty of additional hardware cost and additional power consumption. The variation of performance improvement depends on cache size and set associativity, but the proposed scheme shows more sensitivity to cache size increase than set associativity increase.
    Cache invalidation
    Smart Cache
    Cache pollution
    Page cache
    Bus sniffing
    Cache-oblivious algorithm
    Citations (3)
    Problem statement: Multi-core trends are becoming dominant, creating sophisticated and complicated cache structures. One of the easiest ways to design cache memory for increasing performance is to double the cache size. The big cache size is directly related to the area and power consumption. Especially in mobile processors, simple increase of the cache size may significantly affect its chip area and power. Without increasing the size of the cache, we propose a novel method to improve the overall performance. Approach: We proposed a composite cache mechanism for 1 and L2 cache to maximize cache performance within a given cache size. This technique could be used without increasing cache size and set associatively by emphasizing primary way utilization and pseudo-associatively. We also added victim cache to composite pseudo associative cache for further improvement. Results: Based on our experiments with the sampled SPEC CPU2006 workload, the proposed cache mechanism showed the remarkable reduction in cache misses without affecting the size. Conclusion/Recommendation: The variation of performance improvement depends on benchmark, cache size and set associatively, but the proposed scheme shows more sensitivity to cache size increase than set associatively increase.
    Smart Cache
    Cache invalidation
    Cache pollution
    Page cache
    Bus sniffing
    Cache-oblivious algorithm
    On-chip cache memories play an important role in both performance and energy consumption points of view in resource-constrained embedded systems by filtering many off-chip memory accesses. We propose a 2-level data cache architecture with a low energy-delay product tailored for the embedded systems. The L1 data cache is small and direct-mapped, and employs a write-through policy. In contrast, the L2 data cache is set-associative and adopts a write-back policy. Consequently, the L1 data cache is accessed in one cycle and is able to provide high cache bandwidth while the L2 data cache is effective in reducing global miss rate. To reduce the penalty of high miss rate caused by the small L1 cache and power consumption of address generation, we propose an ECP(Early Cache hit Predictor) scheme. The ECP predicts if the L1 cache has the requested data using both fast address generation and L1 cache hit prediction. To reduce high energy cost of accessing the L2 data cache due to heavy write-through traffic from the write buffer laid between the two cache levels, we propose a one-way write scheme. From our simulation-based experiments using a cycle-accurate simulator and embedded benchmarks, the proposed 2-level data cache architecture shows average 3.6% and 50% improvements in overall system performance and the data cache energy consumption.
    Cache pollution
    Smart Cache
    Cache invalidation
    Page cache
    MESI protocol
    Cache-oblivious algorithm
    Citations (0)
    In this paper, we advocate the notion of "BIG" cache as an innovative abstraction for effectively utilizing the distributed storage and processing capacities of all servers in a cache network. The "BIG" cache abstraction is proposed to partly address the problem of (cascade) thrashing in a hierarchical network of cache servers, where it has been known that cache resources at intermediate servers are poorly utilized, especially under classical cache replacement policies such as LRU. We lay out the advantages of "BIG" cache abstraction and make a strong case both from a theoretical standpoint as well as through simulation analysis. We also develop the dCLIMB cache algorithm to minimize the overheads of moving objects across distributed cache boundaries and present a simple yet effective heuristic for addressing the cache allotment problem in the design of "BIG" cache abstraction.
    Cache invalidation
    Smart Cache
    Cache pollution
    Page cache
    Cache-oblivious algorithm
    Abstraction
    Thrashing
    Citations (14)
    Cache invalidation
    Smart Cache
    Cache pollution
    Page cache
    Bus sniffing
    MESI protocol
    Cache-oblivious algorithm
    In Web cache cluster, single cache image must be realized for locating cache object in cachenode efficiently. The Single Cache Image can hide the heterogeneous and distributed nature of the available cache resources in web cache cluster and present them to users as a single unified cache resource. In order to implement the function, this paper first proposes architecture of cache digest manager. The new architecture bases on the digests of all cachenode, combines with load balance information, and so could locate cache object in cachenode efficiently. Then the paper discusses how to realize cachenode and cache digest manager in Web Cache Cluster.
    Cache invalidation
    Cache pollution
    Smart Cache
    Page cache
    MESI protocol
    Citations (0)
    While set-associative caches incur fewer misses than direct-mapped caches, they typically have slower hit times and higher power consumption, when multiple tag and data banks are probed in parallel. This paper presents the location cache structure which significantly reduces the power consumption for large set-associative caches. We propose to use a small cache, called location cache to store the location of future cache references. If there is a hit in the location cache, the supported cache is accessed as a direct-mapped cache. Otherwise, the supported cache is referenced as a conventional set-associative cache.The worst case access latency of the location cache system is the same as that of a conventional cache. The location cache is virtually indexed so that operations on it can be performed in parallel with the TLB address translation. These advantages make it ideal for L2 cache systems where traditional way-predication strategies perform poorly.We used the CACTI cache model to evaluate the power con-sumption and access latency of proposed cache architecture. Simplescalar CPU simulator was used to produce final results. It is shown that the proposed location cache architecture is power-efficient. In the simulated cache configurations, up-to 47% of cache accessing energy and 25% of average cache access latency can be reduced.
    Cache invalidation
    Smart Cache
    Cache pollution
    Page cache
    MESI protocol
    Cache-oblivious algorithm
    Bus sniffing
    Citations (31)
    The paper presents an algorithm to reduce cache conflicts and improve cache localities. The proposed algorithm analyzes unique locality reference space for each reference pattern, partitions the multi-level cache into several parts with different size, and then maps array data onto the scheduled cache positions such that cache conflicts can be eliminated. To reduce the memory overhead for mapping array variables onto partitioned cache, a greedy method for rearranging array variables in declared statement is also developed. In addition, we combine loop tiling and the proposed schemes for exploiting both temporal and spatial reuse opportunities. To demonstrate that our approach is effective at reducing the number of cache conflicts and exploiting cache localities, we use Atom as a tool to develop a simulator for simulation of the behavior of direct-mapping cache. Experimental results show that applying our cache partitioning scheme can largely reduce the cache conflicts and thus save program execution time in both one-level cache and multi-level cache hierarchies.
    Cache invalidation
    Cache pollution
    Smart Cache
    Page cache
    Cache-oblivious algorithm
    Multi-core trends are becoming dominant, creating sophisticated and complicated cache structures. Also, the bigger shared level-2 (L2) caches are demanded for higher cache performance. One of the easiest ways to design cache memory for increased performance is to double the cache size. However, the big cache size is directly related to the area and power consumption. Especially in mobile processors, simple increase of the cache size may significantly affect its chip area and power. In this paper, we propose a composite cache mechanism for L2 cache to maximize cache performance within a given cache size. This technique can be used without increasing cache size and set associativity by emphasizing primary way utilization and pseudo-associativity. Based on our experiments with the sampled SPEC CPU2000 workload, the proposed cache mechanism shows the remarkable reduction in cache misses. The variation of performance improvement depends on cache size and set associativity, but the proposed scheme shows more sensitivity to cache size increase than set associativity increase.
    Cache invalidation
    Smart Cache
    Cache pollution
    Page cache
    Bus sniffing
    Citations (2)
    Through the analysis and research of the current data cache technology, a distributed data cache system based on cache-network in this paper is proposed so as to improve the performance of data cache system and the retrieve efficiency of the client/server networks by reducing transmission time and balancing load. The main point of the data cache system is to set up data cache section in each node of network and to realize unified allocation and management of the cache sections. Through analyzing the performance of the cache system and the comparison with other cache systems, the results show the efficiency of the cache system.
    Cache invalidation
    Smart Cache
    Cache pollution
    Page cache
    MESI protocol
    Cache-oblivious algorithm
    Citations (0)