Current loop buffer organizations for very large instruction word processors are essentially centralized. As a consequence, they are energy inefficient and their scalability is limited. To alleviate this problem, we propose a clustered loop buffer organization, where the loop buffers are partitioned and functional units are logically grouped to form clusters, along with two schemes for buffer control, which regulate the activity in each cluster. Furthermore, we propose a design-time scheme to generate clusters by analyzing an application profile and grouping closely related functional units. The simulation results indicate that the energy consumed in the clustered loop buffers is, on average, 63 percent lower than the energy consumed in an uncompressed centralized loop buffer scheme, 35 percent lower than a centralized compressed loop buffer scheme, and 22 percent lower than a randomly clustered loop buffer scheme.
Clustering L0 buffers is effective for reduction of energy consumption in the instruction memory hierarchy of embedded VLIW processors. However, efficiency of the clustering depends on schedule and assignment of a target application. This paper proposes a tool flow to explore operation shuffling for improving generation of L0 clusters. The tools explore assignment of operations for each cycle and generate various schedules. This approach makes it possible to reduce energy consumption for various processor architectures, however, the computational complexity is large because of huge exploration space. Therefore, some heuristics are also developed, which reduce the size of exploration space while the quality of solution remains reasonable. The experimental results indicate potential gain of up to 22.0% in energy of the operation shuffling for various heterogeneous processor architectures. Furthermore, the proposed heuristics drastically reduce the exploration search space by about 90%, while the results are comparable to full search, differences of up to 1%.
Users expect future handheld devices to provide extended multimedia functionality and have long battery life. This type of application imposes heavy constraints on both (realtime) performance and energy consumption and forces designers to optimise all parts of their platform. In this experiment we focus on the different processor core design options for embedded platforms, including the effect of instruction memory hierarchy on the energy consumption. The results show that significant improvements for energy efficiency and/or performance over currently used RISC or VLIW processors can be achieved. We conclude, based on concrete data for a realistic application, that different styles, including both configurable hardware and instruction set processors, find their way into heterogeneous platforms and designers need to be aware of the trade-offs. Secondly, we show for the same application task that a heavily optimised instruction/configuration memory hierarchy can significantly reduce the energy consumption of this part, so it forms a crucial part of every energy aware design.
Abstract The invention of micromirror arrays has sparked a revolution in vision systems research. These devices are an indispensable component in many current generation products ranging from large scale projection engines and portable projectors to heads up displays. In addition, these devices are enabling new research paths in spectroscopy, lithography, volumetric displays and optical networking, just to name a few. In this article we highlight some of the recent advances in micromirror array technology, especially using Silicon‐Germanium (SiGe) MEMS. We will also present an overview of two applications this technology gave rise to, namely the holographic display and a micromirror array based zoom lens.
For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the instruction memory of embedded processors. In particular, software controlled clustered loop buffers are potentially very energy efficient. However current compilers for VLIW do not fully exploit the potentials offered by such a clustered organization. This paper presents ITSE, Instruction Transfer and Storage Exploration, a methodology to minimize the instruction memory energy using a software controlled clustered loop buffer as a basis. Results for the MediaBench application suite show 61% reduction (on average) in energy in the instruction memory hierarchy as compared to traditional, existing non-clustered approaches to the loop buffer without compromising performance.
Reduced energy consumption is one of the most important design goals for embedded application domains like wireless, multimedia and biomedical. Instruction memory hierarchy has been proven to be one of the most power hungry parts of the system. This paper introduces an architectural enhancement for the instruction memory to reduce energy and improve performance. The proposed distributed instruction memory organization requires minimal hardware overhead and allows execution of multiple loops in parallel in a uni-processor system. This architecture enhancement can reduce the energy consumed in the instruction and data memory hierarchy by 70.01 % and improve the performance by 32.89% compared to enhanced SMT based architectures