This paper presents an overview for low-power successive approximation register (SAR) analog-to-digital converters (ADCs). It covers the operation principle, error analysis, and practical design issues. Furthermore, this paper provides a comprehensive survey of state-of-the-art low-power design techniques for every circuit block in the SAR ADC, including comparator, capacitive digital-to-analog converter (DAC), and SAR logic. The goal of this paper is to provide a useful overview to SAR ADC designers who want to improve the energy efficiency targeting low-to-medium speed applications.
This article presents an energy-efficient comparator design. The pre-amplifier adopts an inverter-based input pair powered by a floating reservoir capacitor; it realizes both current reuse and dynamic bias, thereby significantly boosting gm/ID and reducing noise. Moreover, it greatly reduces the influence of the process corner and the input common-mode voltage on the comparator performance, including noise, offset, and delay. A prototype comparator in 180 nm achieves 46-μV input-referred noise while consuming only 1 pJ per comparison under a 1.2-V supply. This represents greater than seven-time energy efficiency boost compared with a strong-arm (SA) latch. It achieves the highest reported comparator energy efficiency to the best of our knowledge.
Despite recent developments in automated analog sizing and analog layout generation, there is doubt whether analog design automation techniques could scale to system-level designs. On the other hand, analog designs are considered major roadblocks for open source hardware with limited available design automation tools. In this work, we present OpenSAR, the first open source automated end-to-end successive approximation register (SAR) analog-to-digital converter (ADC) compiler. OpenSAR only requires system performance specifications as the minimal input and outputs DRC and LVS clean layouts. Compared with prior work, we leverage automated placement and routing to generate analog building blocks, removing the need to design layout templates or libraries. We optimize the redundant non-binary capacitor digital-to-analog converter (CDAC) array design for yield considerations with a template-based layout generator that interleaves capacitor rows and columns to reduce process gradient mismatch. Post layout simulations demonstrate that the generated prototype designs achieve state-of-the-art resolution, speed, and energy efficiency.
With the rapid advancement of edge AI, the complexity of tasks on edge devices is continually increasing, demanding better efficiency and precision from AI accelerators. Pre-aligned floating-point computing-in-memory (FP CIM) has been proposed to achieve high-precision neural network (NN) computations based on floating-point (FP) data precision. However, the complex digital circuitry required for integer (INT) mantissa multiply-accumulate (MAC) computation and exponent alignment severely limits the efficiency and throughput of FP CIM. This work proposes an energy-and area-efficient computing-in-memory (CIM) engine for one-shot FP NN inference and on-device fine-tuning. To improve the throughput of FP CIM, a one-shot compute scheme is proposed to perform FP operation within one cycle. It adopts the multiply-less NN instead of the multiply-based NN to simplify the integer mantissa MAC to minimum selection. A customized 8-bit parallel minimum selector is also designed to further reduce the parallel computation cost. To simplify the FP/INT conversion process, an input–weight co-alignment workflow is proposed to eliminate maximum exponent selection and simplify mantissa shifting logic. To minimize the inference accuracy loss caused by environmental changes, a lightweight on-device fine-tuning core (ODFC) is designed to support online weight updates. The 28-nm fabricated chip achieves an energy efficiency of 128 TFLOPS/W and a computational density of 7.02 TFLOPS/mm $^2$ at BF16, representing a 4.1 $\times$ and 3.4 $\times$ improvement over previous state-of-the-art works, respectively.
This paper presents a power-efficient 13-tap finite impulse response (FIR) filter, and an infinite impulse response (IIR) filter embedded in a 10-bit SAR ADC. Using the capacitive DAC array, the IIR filter is inherently realized without DC attenuation through sharing charge in the capacitors between the FIR filter and the SAR ADC periodically. The IIR filtering performance can be optimized further through the time-interleaved technique and introducing two additional phases. Compared with a 15-tap FIR filter, the attenuation at the cutoff frequency is enhanced by 9dB theoretically. A prototype filter in 40nm CMOS occupies an active area of 0.067mm 2 , consumes 38uW with 1MHz bandwidth, and obtains >42.2dB out-of-band suppression when operated at 40MS/s.
This paper presents a time-interleaved (TI) SAR analog-to-digital converter (ADC) with a fast variance-based timing-skew calibration technique. It uses a single-comparator-based window detector (WD) to calibrate the timing skew. The WD can suppress variance estimation errors and allow precise variance estimation from a significantly small number of samples. It has low-hardware cost and orders of magnitude faster convergence speed compared to prior variance-based timing-skew calibration technique. The proposed technique brings collateral benefit of offset mismatch calibration. After timing-skew calibration, a prototype 10-b 800-MS/s ADC in 40-nm CMOS achieves the Nyquist-rate SNDR of 48 dB and consumes 4.9 mW, leading to the Walden FoM of 29.8-fJ/conversion step.
The smart edge nodes require efficient matrix-vector multiplications for local deep neural network (DNN) inference. Benefiting from its high density and CMOS compatibility, the eDRAM-based computing-in-memory (CIM) [1] –[4], especially with multi-level cells (MLCs) [4], attracts rising attention. However, the performance of prior MLCeDRAM CIM is severely limited by the inconsistency of weight representations during the programming and computing: weights are programmed as fixed voltages while transistor currents are used for computation. Thus, the programming of MLCs requires calibration due to the nonlinear transistor I-V, which can be extremely complicated in the presence of $V_{T H}$ variations. Furthermore, the computing precision is severely degraded by $\mathrm{V}_{\mathrm{TH}}$ variations when small computing currents are used for high parallelism. To fundamentally surmolunt this dilemma, we propose the first currentprogramming eDRAM CIM that unifies the weight programming and computing in the current domain. The enabling technique is a novel 3T1C eDRAM cell (Fig. 1, top right). It confers several key merits: 1) the cell is programmed by the weight current directly with the selfcalibrated voltage generated on the storage capacitor; it essentially stores the weight current instead of a fixed voltage, thus mitigating $V_{\text {TH}}$ variation and nonlinear transistor I-V impacts; 2) a dynamiccascoded read structure is proposed to significantly reduce the V B sensitivity while not requiring any bias voltage; 3) thanks to the accurately programmed cell, it supports MLC operation (8 current levels) without any calibration, largely increasing density; 4) a voltage-current two-step programming scheme significantly boosts the sub- $\mu \mathrm{A}$ current-weight writing speed. Combining these merits, the proposed eDRAM cell is naturally immune to transistor-level nonidealities, thus allowing a small LSB weight current of only 100nA. A $4 \mathrm{~b}$ CIM cell composed of 2MLCs is developed to support 4bsigned weights. It contains 15 current levels ranging from -700nA to 700nA. Fabricated in a $65 \mathrm{~nm}$ CMOS, the prototype achieves the highest macro-level 4b-MAC energy efficiency of 233-305TOPS/W among eDRAM CIMs.
This paper presents a SAR ADC with reduced front-end sampling kT/C noise. This is achieved by using an active sampling circuit with a specially designed 2-stage amplifier that decouples the tight relationship between the sampling noise power spectral density (PSD) and BW. A 12-bit 12-MS/s prototype ADC in 40nm CMOS achieves the sampling noise power reduction by 3.5 times. It permits the use of a small sampling capacitor of only 132 fF. This relaxes the requirement on the ADC input driver and reference buffer, which can lead to significant savings in power, area, and complexity on the system level.