In the early design phase of a Deep Neural Network (DNN) acceleration system, fast energy and latency estimation are important to evaluate the optimality of different design candidates on algorithm, hardware, and algorithm-to-hardware mapping, given the gigantic design space. This work proposes a uniform intra-layer analytical latency model for DNN accelerators that can be used to evaluate diverse architectures and dataflows. It employs a 3-step approach to systematically estimate the latency breakdown of different system components, capture the operation state of each memory component, and identify stall-induced performance bottlenecks. To achieve high accuracy, different memory attributes, operands' memory sharing scenarios, as well as dataflow implications have been taken into account. Validation against an in-house taped-out accelerator across various DNN layers has shown an average latency model accuracy of 94.3%. To showcase the capability of the proposed model, we carry out 3 case studies to assess respectively the impact of mapping, workloads, and diverse hardware architectures on latency, driving design insights for algorithm-hardware-mapping co-optimization.
Current mode logic (CML) circuits have been widely used in high-speed data transceivers. The lower-voltage-swing makes the switching speed of CML much higher than the static logic can achieve, so it is worthy to adopt the CML circuits at the cost of higher power consumption in the high-speed applications. In order to obtain a better power efficiency (Frequency/power) in CML, it is critical to reduce the power consumption while maintaining the high operating frequency. This paper proposes an alternative approach by building the CML circuits with tunneling-field-effect-transistor (Tunnel FETs or TFETs) to achieve a high-throughput, low-voltage interface circuit design. By taking advantage of its steep subthreshold slope (less than 60 mV/dec), TFET exhibits the same on/off current ratio at the input voltage swing interval much lower than that of the MOSFETs, which enables the supply voltage scaling in CML circuits. For a design target data-rate (20 Gbps for multiplexer and 50 Gbps for buffer), our simulations show that the proposed TFET CML circuits are able to reduce the supply voltage from 0.6 V in conventional Si FinFET CML circuits to as low as 0.3 V while using the same constant tail current. As a result, a power consumption reduction of approximately 50% is achieved by the proposed TFET CML circuits, making the TFET CML approach a promising candidate for future low-power, high-performance applications.
The output characteristics of typical Impact Ionization MOS (IMOS) devices is comprehensively investigated in this paper. The results show that IMOS devices exhibit a narrow operating margin with non-saturated on-current. To improve the output characteristics, a new structure named "pocket IMOS" is proposed. Compared with the typical I-MOS devices, the "pocket I-MOS" can achieve a wider operating margin with lower source-drain bias and enhanced output performance.
The supply voltage scaling has become increasingly challenging in the advanced CMOS technology due to the threshold voltage requirement for transistor OFF leakage, limiting the system energy efficiency. Spintronic logic utilizes the physical quantity of magnetization or spin as a computation variable, offering new design paradigms in terms of ultralow voltage operation and nonvolatility, but suffering from switching inefficiency of the charge-to-spin conversion. The magnetoelectric spin-orbit (MESO) device has been proposed as a new alternative logic device candidate to solve these issues by magnetoelectric transduction using a novel multiferroic oxide [both ferroelectric (FE) and antiferromagnetic], a ferromagnet and a spin injection layer. It enables a path toward 10-100× switching energy reduction compared to CMOS inverter in 2018 node, due to the device and material innovations of a novel switching mechanism. In this paper, in order to build a MESO logic family for new circuit and architecture exploration, we propose for the first time the fundamental building blocks such as MESO sequential and combinatorial circuits for the synchronous logic operation. We employ a transistor sharing and a novel multiphase clocking scheme to address the circuit design issues such as clock control, directionality of state propagation, and power gating and enable the design exploration for MESO digital logic. Based on the proposed circuit techniques, we demonstrate these MESO logic functions operating at a supply voltage of 100 mV and a clock period of 1.2 ns with 320-ac FE polarization through the circuit simulations.
Three-dimensional integration offers architectural and performance benefits for scaling augmented/virtual reality (AR/VR) models on highly resource-constrained edge devices. Two-dimensional off-chip memory interfaces are too prohibitively energy intensive and bandwidth (BW) limited for AR/VR devices. To solve this, we propose using advanced 3-D stacking technology for high-density vertical integration to local memory and compute, increasing memory capacity within the same footprint at iso-BW with improvements in energy and latency. We evaluate 3-D architectures for a prototype AR/VR accelerator to demonstrate up to 3.9× latency reduction and 1.6× lower energy compared to a 2-D configuration within a smaller/similar footprint. Additionally, we show the feasibility of deploying higher resolution AR/VR models by stacking multiple tiers of memory, providing a pathway to break the footprint constraints of 2-D architectures. The use of high-density 3-D interconnects allows us to demonstrate localized benefits at the accelerator-level compared with standard system-on-chip memory disaggregation techniques/architectures.
In this paper, two novel 6T SRAM cells based on Independently-Controlled-Gate FinFETs are proposed. The new 6T cells are derived from 4T cells: by separating the read timing and read-line, the proposed new cells allow simultaneously read & write to different addresses. To overcome the traditional retention time problem in 4T cells, the proposed cells reduce leakage by changing the back-gate connection and increasing the capacitance at data storage points (Q, QB). Compared to previous 6T FinFET SRAMs, the proposed cells reduce the static leakage current, and enhance the write and read speed. In addition, this structure is scalable for multi-ports.
Radio-frequency (RF)-powered energy harvesting systems have offered new perspectives in various scientific and clinical applications such as health monitoring, bio-signal acquisition, and battery-less data-transceivers. In such applications, an RF rectifier with high sensitivity, high power conversion efficiency (PCE) is critical to enable the utilization of the ambient RF signal power. In this paper, we explore the high PCE advantage of the steep-slope III-V heterojunction tunnel field-effect transistor (HTFET) RF rectifiers over the Si FinFET baseline design for RF-powered battery-less systems. We investigate the device characteristics of HTFETs to improve the sensitivity and PCE of the RF rectifiers. Different topologies including the two-transistor (2-T) and four-transistor (4-T) complementary-HTFET designs, and the n-type HTFET-only designs are evaluated with design parameter optimizations to achieve high PCE and high sensitivity. The performance evaluation of the optimized 4-T cross-coupled HTFET rectifier has shown an over 50% PCE with an RF input power ranging from -40 dBm to -25 dBm, which significantly extends the RF input power range compared to the baseline Si FinFET design. A maximum PCE of 84% and 85% has been achieved in the proposed 4-T N-HTFET-only rectifier at -33.7 dBm input power and the 4-T cross-coupled HTFET rectifier at -34.5 dBm input power, respectively. The capability of obtaining a high PCE at a low RF input power range reveals the superiority of the HTFET RF rectifiers for battery-less energy harvesting applications.
Steep switching Tunnel FETs (TFET) can extend the supply voltage scaling with improved energy efficiency for both digital and analog/RF application. In this paper, recent approaches on III-V Tunnel FET device design, prototype device demonstration, modeling techniques and performance evaluations for digital and analog/RF application are discussed and compared to CMOS technology. The impact of steep switching, uni-directional conduction and negative differential resistance characteristics are explored from circuit design perspective. Circuit-level implementation such as III-V TFET based Adder and SRAM design shows significant improvement on energy efficiency and power reduction below 0.3V for digital application. The analog/RF metric evaluation is presented including gm/Ids metric, temperature sensitivity, parasitic impact and noise performance. TFETs exhibit promising performance for high frequency, high sensitivity and ultra-low power RF rectifier application.
Augmented reality (AR) products require energy-efficient systems-on-chip (SoCs) for machine learning (ML), neural networks (NNs), and image signal processing (ISP) applications [1]. These SoCs must be high-performance, yet low power with compact form factors. They are heavily constrained by the area footprint, while the third dimension is usually left with ample space. Moreover, frequent access to off-chip memories can be prohibitively expensive in terms of latency and energy for AR devices. Fortunately, recent advances in 3D integration allow integration of additional logic and memory into the SoC without area footprint cost. In particular, face-to-face (F2F) stacking with hybrid bonding (HB) allows for high bandwidth (BW) connections between dies without incurring substantial energy overhead. We demonstrate, for the first time in the AR domain, a 3D integrated SoC using face-to-face hybrid-bonding technology to show: (1) deployment of larger workloads not previously feasible on an iso-form factor baseline due to memory capacity limitations and strict execution time and energy requirements, and (2) system-level energy and execution time savings (up to 40% for each) for our prototype AR SoC within tight form factor constraints.
For years there has been a call to increase remote work, conferencing, and education. Although many companies have geographically distributed teams and students have moved to online instruction, remote working and learning has yet to become the norm despite the available technology and resources. Remote work and education provide positive environmental benefits as well as improved work-life integration and flexibility. Today, key challenges include effective communication, laboratories, isolation, and privacy. Being at the forefront of innovation, our community often leads technology adoption. In this special event, we explore how to shape the inevitable shift to more distributed and remote styles of working and learning.