logo
    Energy-Performance Efficient 2-Level Data Cache Architecture for Embedded System
    0
    Citation
    0
    Reference
    20
    Related Paper
    Abstract:
    On-chip cache memories play an important role in both performance and energy consumption points of view in resource-constrained embedded systems by filtering many off-chip memory accesses. We propose a 2-level data cache architecture with a low energy-delay product tailored for the embedded systems. The L1 data cache is small and direct-mapped, and employs a write-through policy. In contrast, the L2 data cache is set-associative and adopts a write-back policy. Consequently, the L1 data cache is accessed in one cycle and is able to provide high cache bandwidth while the L2 data cache is effective in reducing global miss rate. To reduce the penalty of high miss rate caused by the small L1 cache and power consumption of address generation, we propose an ECP(Early Cache hit Predictor) scheme. The ECP predicts if the L1 cache has the requested data using both fast address generation and L1 cache hit prediction. To reduce high energy cost of accessing the L2 data cache due to heavy write-through traffic from the write buffer laid between the two cache levels, we propose a one-way write scheme. From our simulation-based experiments using a cycle-accurate simulator and embedded benchmarks, the proposed 2-level data cache architecture shows average 3.6% and 50% improvements in overall system performance and the data cache energy consumption.
    Keywords:
    Cache pollution
    Smart Cache
    Cache invalidation
    Page cache
    MESI protocol
    Cache-oblivious algorithm
    Recently, a single or multi processor system uses the hierarchical memory structure to reduce the time gap between processor clock rate and memory access time. A cache memory system includes especially two or three levels of caches to reduce this time gap. Moreover, one of the most important things In the hierarchical memory system is the hit rate in level 1 cache, because level 1 cache interfaces directly with the processor. Therefore, the high hit rate in level 1 cache is critical for system performance. A victim cache, another high level cache, is also important to assist level 1 cache by reducing the conflict miss in high level cache. In this paper, we propose the advanced high level cache management scheme based on the processor reuse information. This technique is a kind of cache replacement policy which uses the frequency of processor's memory accesses and makes the higher frequency address of the cache location reside longer in cache than the lower one. With this scheme, we simulate our policy using Augmint, the event-driven simulator, and analyze the simulation results. The simulation results show that the modified processor reuse information scheme(LIVMR) outperforms the level 1 with the simple victim cache(LIV), 6.7% in maximum and 0.5% in average, and performance benefits become larger as the number of processors increases.
    Cache pollution
    Cache invalidation
    Smart Cache
    Page cache
    Pipeline burst cache
    Cache-oblivious algorithm
    Citations (0)
    We propose a novel energy-efficient data cache architecture, namely, word-interleaved (WI) cache. In theWI cache, a cache block is distributed uniformly among the different cache ways and each line of a cache way holds some words of the block. This distribution provides an opportunity to activate/deactivate the cache ways based on the requested address's offset, thus minimizing the overall cache access energy. For a 4-way set associative cache of size 16KB and blocksize 32B, the proposed technique accomplishes dynamic energy savings of 54.2% without considering fast hits and 62.3% when fast hits are considered, with small performance degradation and negligible area overhead.
    Cache invalidation
    Smart Cache
    Cache pollution
    Page cache
    Cache-oblivious algorithm
    MESI protocol
    Citations (8)
    This work presents a new hardware cache management approach for improving the cache hit ratio and reducing the bus traffic. Increasing the L1 cache hit ratio is a crucial aspect of obtaining good performance with the current processors. The proposed approach also increases the overall (L1 plus L2) cache hit ratio, especially in multiprocessor systems, where the bus latencies are low. This work focuses in multiprocessor systems where a forth kind of miss (the coherence miss) and the bus utilization problem appear; however, the model can also be applied to uniprocessor systems. Our organization increases the overall cache hit ratio and thus reduces the bus utilization. The proposed model introduces two independent L1 caches with different organizations placed in parallel. Each cache block has attached to it a small counter for storing the reuse related information. The proposed microarchitecture not only reduces the bus traffic and speeds up better than the conventional organization, but it also saves die area. The performance (versus conventional cache organizations) increases as the number of processors increases.
    Cache pollution
    Cache invalidation
    Smart Cache
    Page cache
    Uniprocessor system
    MESI protocol
    Bus sniffing
    Citations (21)
    In this paper, we design a cache scheme which can reduce the power consumption and increase the performance of cache through auto-resize of L1 cache, which is called auto-selecting cache scheme. Cache memory occupies a significant fraction of a chip’s overall power consumption. Recent researches advocate using “resizable” to adjust cache capability based on the fact that requirement in applications will reduce cache size and power consumption. Based on the fact that different programs need different sizes of instruction and data cache, an auto-selecting cache scheme is proposed. This scheme can dynamically adjust the sizes of level 1 cache according to program requirement of instruction and data cache. The proposed structure can reduce power consumption and improve cache performance. According to the SPEC2000 simulation, the average power consumption of L1 cache is reduced by 7.43% and the average of energy delay production is improved by 16.08% with auto-selecting cache structure compared traditional one.
    Cache invalidation
    Smart Cache
    Cache pollution
    Page cache
    MESI protocol
    Citations (0)
    Many high-performance microprocessors employ cache write-through policy for performance improvement and at the same time achieving good tolerance to soft errors in on-chip caches. However, write-through policy also incurs large energy overhead due to the increased accesses to caches at the lower level (e.g., L2 caches) during write operations. In this paper, we propose a new cache architecture referred to as way-tagged cache to improve the energy efficiency of write-through caches. By maintaining the way tags of L2 cache in the L1 cache during read operations, the proposed technique enables L2 cache to work in an equivalent direct-mapping manner during write hits, which account for the majority of L2 cache accesses. This leads to significant energy reduction without performance degradation. Simulation results on the SPEC CPU2000 benchmarks demonstrate that the proposed technique achieves 65.4% energy savings in L2 caches on average with only 0.02% area overhead and no performance degradation. Similar results are also obtained under different L1 and L2 cache configurations. Furthermore, the idea of way tagging can be applied to existing low-power cache design techniques to further improve energy efficiency.
    Smart Cache
    Cache pollution
    Cache invalidation
    Page cache
    Bus sniffing
    Spec#
    Citations (21)
    Write-through policy employed in many high-performance microprocessors provides good tolerance to soft errors in cache systems. However, it also incurs large energy overhead due to the increased accesses to caches at the lower level (e.g., the L2 cache) during write operations. In this paper, we propose a new cache architecture referred to as way-tagged cache to improve the energy efficiency of write-through cache systems. By maintaining the way tags of the L2 cache in the L1 cache during read operations, the proposed technique enables the L2 cache to work in an equivalent direct-mapping manner during write hits, which account for the majority of L2 cache accesses. This leads to significant energy reduction. Simulation results on the SPEC CPU2000 benchmarks demonstrate that the proposed technique achieves 65.4% energy savings on average with about 0.02% area overhead and no performance degradation.
    Smart Cache
    Cache invalidation
    Cache pollution
    Page cache
    Spec#
    Bus sniffing
    Citations (14)
    Caches may consume half of a microprocessor's total power and cache misses incur accessing off-chip memory, which is both time consuming and energy costly. Therefore, minimizing cache power consumption and reducing cache misses are important to reduce total energy consumption of embedded systems. Direct mapped caches consume much less power than that of same sized set associative caches but with a poor hit rate on average. Through experiments, we observe that memory space of direct mapped instruction caches is not used efficiently in most embedded applications. We design an efficient cache - a configurable instruction cache that can be tuned to utilize the cache sets efficiently for a particular application such that cache memory is exploited more efficiently by index remapping. Experiments on 11 benchmarks drawn from Mediabench show that the efficient cache achieves almost the same miss rate as a conventional two-way set associative cache on average and with total memory-access energy savings of 30% compared with a conventional two-way set associative cache.
    Cache pollution
    Smart Cache
    Cache invalidation
    Page cache
    Bus sniffing
    Citations (14)
    With the increasing performance gap between processor and memory, it is essential that caches are utilized efficiently. However, caches are very inefficiently utilized because not all the excess data fetched into the cache, to exploit spatial locality, is accessed. Studies have shown that a prediction accuracy of about 95% can be achieved when predicting the to-be-referenced words in a cache block. In this paper, we use this prediction mechanism to fetch only the to-be-referenced data into the L1 data cache on a cache miss. We then utilize the cache space, thus made available, to store words from multiple cache blocks in a single physical cache block space in the cache, thus increasing the useful words in the cache. We also propose methods to combine this technique with a value-based approach to further increase the cache capacity. Our experiments show that, with our techniques, we achieve about 57% of the L1 data cache miss rate reduction and about 60% of the cache capacity increase observed when using a double sized cache, with only about 25% cache space overhead.
    Cache invalidation
    Cache pollution
    Page cache
    Smart Cache
    MESI protocol
    Cache-oblivious algorithm
    Bus sniffing
    Citations (5)
    Caching is an important technique to improve computer system performance by storing the most recently used data and instructions for main memory. Cache is widely used in modern computer systems and will continue to be an irreplaceable unit to narrow the speed gap between processor and main memory. With the increasing capacity of main memory and the number of processor cores, the cache technology has great development. In this paper, we have some lessons of cache hierarchy changes with the memory technology from experimental methodology. We design a serial of experiments and try to answer some questions about cache designs. Our experiments results indicate that more levels of cache does not necessarily means better performance for all benchmarks, that last level cache miss rate has no direct connection with the system performance, that the average performance of exclusive cache hierarchy is more effective than that of inclusive cache.
    Cache pollution
    Cache invalidation
    Smart Cache
    Cache-oblivious algorithm
    Page cache
    Memory hierarchy
    Cache-only memory architecture
    Citations (0)