This invited paper surveys the recent progresses of compute-in-memory (CIM) prototype chip designs with emerging nonvolatile memories (eNVMs) such as resistive random access memory (RRAM) technology. 8kb to 4Mb CIM mixed-signal macros (with analog computation within the memory array) have been demonstrated by academia and industry, showing promising energy efficiency and throughput for machine learning inference acceleration. However, grand challenges exist for large-scale system design including the following: 1) substantial analog-to-digital (ADC) overhead; 2) scalability to advanced logic node limited by high write voltage of eNVMs; 3) process variations (e.g. ADC offset) that degrade the inference accuracy. Mitigation strategies and possible future research directions are discussed.
Resistive random access memory (RRAM) based compute-in-memory (CIM) has shown great potentials for deep neural network (DNN) inference. Prior works generally used off-chip write-verify scheme to tighten the RRAM resistance distribution and used off-chip analog-to-digital converter (ADC) references to fine-tune partial sum quantization edges. Though off-chip techniques are viable for testing purposes, they are unsuitable for practical applications. This work presents an RRAM-CIM macro that features 1) on-chip write-verify to speed up initial weight programming and periodically refresh cells to compensate for resistance drift under stress, and 2) on-chip ADC reference generation that provides column-wise tunability to mitigate offsets induced by process variation to guarantee CIFAR-10 accuracy of >90%. The design is taped-out in TSMC N40 RRAM process, and achieves 36.4TOPS/W for 1×1b MAC operations on VGG-8 network.
Rapid development in deep neural networks (DNNs) is enabling many intelligent applications. However, on-chip training of DNNs is challenging due to the extensive computation and memory bandwidth requirements. To solve the bottleneck of the memory wall problem, compute-in-memory (CIM) approach exploits the analog computation along the bit line of the memory array thus significantly speeds up the vector-matrix multiplications. So far, most of the CIM-based architectures target at implementing inference engine for offline training only. In this article, we propose CIMAT, a CIM Architecture for Training. At the bitcell level, we design two versions of 7T and 8T transpose SRAM to implement bi-directional vector-to-matrix multiplication that is needed for feedforward (FF) and backprogpagation (BP). Moreover, we design the periphery circuitry, mapping strategy and the data flow for the BP process and weight update to support the on-chip training based on CIM. To further improve training performance, we explore the pipeline optimization of proposed architecture. We utilize the mature and advanced CMOS technology at 7 nm to design the CIMAT architecture with 7T/8T transpose SRAM array that supports bi-directional parallel read. We explore the 8-bit training performance of ImageNet on ResNet-18, showing that 7T-based design can achieve 3.38× higher energy efficiency (~6.02 TOPS/W), 4.34× frame rate (~4,020 fps) and only 50 percent chip size compared to the baseline architecture with conventional 6T SRAM array that supports row-by-row read only. The even better performance is obtained with 8T-based architecture, which can reach ~10.79 TOPS/W and ~48,335 fps with 74-percent chip area compared to the baseline.
This paper presents an ADC-free compute-in-memory (CIM) RRAM-based macro, exploiting the fully analog intra-/inter-array computation. The main contributions include: 1) a lightweight input-encoding scheme based on pulse-width modulation (PWM), which improves the compute throughput by ~7 times; 2) a fully analog data processing manner between sub-arrays without explicit ADCs, which does not introduce quantization loss and saves the power by a factor of 11.6. The 40nm prototype chip with TSMC RRAM achieves energy efficiency of 421.53 TOPS/W and compute efficiency of 360 GOPS/mm 2 (normalized to binary operation) at 100MHz.
In the era of big data and artificial intelligence, hardware advancement in throughput and energy efficiency is essential for both cloud and edge computations. Because of the merged data storage and computing units, compute-in-memory is becoming one of the desirable choices for data-centric applications to mitigate the memory wall bottleneck in von-Neumann architecture. In this chapter, the recent architectural designs and underlying circuit/device technologies for compute-in-memory are surveyed. The related design challenges and prospects are also discussed to provide an in-depth understanding of interactions between algorithms/architectures and circuits/devices. The chapter is organized hierarchically: the overview of the field (Introduction section); the principle of compute-in-memory (section "DNNDeep neural networks (DNNs) Basics and Corresponding CIM Principle"); the latest architecture and algorithm techniques including network model, data flow, pipeline design, and quantization approaches (section "Architecture and Algorithm Techniques for CIM"); the related hardware support including embedded memory technologies such as static random access memories and emerging nonvolatile memories, as well as the peripheral circuit designs with a focus on the analog-to-digital converters (section "Hardware Implementations for CIM Architecture"); a summary and outlook of the compute-in-memory architecture (Conclusion section).
12 issues per year.Ins tu onal subscrip on prices for 2018 are: Print & Online: £1634 (UK), €2073 (Europe), $2743 (The Americas), $3197 (Rest of World).Prices are exclusive of tax.Asia-Pacifi c GST,
The aim of this research was to determine the feasibility of a newly developed process in the repair of cracked gas turbine casings made of ductile cast iron. This study investigated the microstructural characteristics, metallurgy and mechanical properties of the repair weldments produced using fibre laser cladding. Optical microscopy, scanning electron microscopy and element probe microanalysis were used to investigate the microstructure at the cladding weld interface. The mechanical properties of the cladded specimens were evaluated after laser cladding. Our results revealed that the weldability of ductile cast iron can be enhanced by performing laser surface pretreatment to sublimate graphite nodules. Microhardness at the interface of the laser cladded weldments depended largely on the range of the heat affected zone and the degree of phase complexity. Under tensile loading, failures were limited to the base metal region of the weldments. Test results demonstrate that the impact toughness of the interface between the fusion zone and the base metal can be enhanced through the application of post-cladding heat treatment.
Machine learning inference engine is of great interest to smart edge computing. Compute-in-memory (CIM) architecture has shown significant improvements in throughput and energy efficiency for hardware acceleration. Emerging non-volatile memory technologies offer great potential for instant on and off by dynamic power gating. Inference engine is typically pre-trained by the cloud and then being deployed to the filed. There are new attack models on chip cloning and neural network model reverse engineering. In this paper, we propose countermeasures to the weight cloning and input-output pair attacks. The first strategy is the weight fine-tune to compensate the analog-to-digital converter (ADC) offset for a specific chip instance while inducing significant accuracy drop for cloned chip instances. The second strategy is the weight shuffle and fake rows insertion to allow accurate propagation of the activations of the neural network only with a key.