Shanshi Huang

Georgia Institute of Technology

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Shimeng Yu

Georgia Institute of Technology

Hongwu Jiang

Georgia Institute of Technology

Xiaochen Peng

Georgia Institute of Technology

Wantong Li

Georgia Institute of Technology

Xiaoyu Sun

Beijing Center for Disease Prevention and Control

Yandong Luo

Georgia Institute of Technology

Anni Lu

Georgia Institute of Technology

Yen-Chi Chou

National Tsing Hua University

Jian-Wei Su

Industrial Technology Research Institute

Wei-Hsing Huang

National Tsing Hua University

Cooperative Institutions

National Tsing Hua University

John Wiley & Sons (United States)

Heidelberg University

Hudson Institute

Bioengineering Center

National Center for Genetic Engineering and Biotechnology

Chinese Academy of Sciences

Karolinska Institutet

National Taiwan University

Georgia Institute of Technology

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Compute-in-Memory with Emerging Nonvolatile-Memories: Challenges and Prospects

2022 IEEE Custom Integrated Circuits Conference (CICC) (2020)

Shimeng Yu Xiaoyu Sun Xiaochen Peng Shanshi Huang

This invited paper surveys the recent progresses of compute-in-memory (CIM) prototype chip designs with emerging nonvolatile memories (eNVMs) such as resistive random access memory (RRAM) technology. 8kb to 4Mb CIM mixed-signal macros (with analog computation within the memory array) have been demonstrated by academia and industry, showing promising energy efficiency and throughput for machine learning inference acceleration. However, grand challenges exist for large-scale system design including the following: 1) substantial analog-to-digital (ADC) overhead; 2) scalability to advanced logic node limited by high write voltage of eNVMs; 3) process variations (e.g. ADC offset) that degrade the inference accuracy. Mitigation strategies and possible future research directions are discussed.

Non-Volatile Memory

Phase-change memory

10.1109/cicc48029.2020.9075887

Cite

Citations (57)

A 40nm RRAM Compute-in-Memory Macro Featuring On-Chip Write-Verify and Offset-Cancelling ADC References

ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC) (2021)

Wantong Li Xiaoyu Sun Hongwu Jiang Shanshi Huang Shimeng Yu

Resistive random access memory (RRAM) based compute-in-memory (CIM) has shown great potentials for deep neural network (DNN) inference. Prior works generally used off-chip write-verify scheme to tighten the RRAM resistance distribution and used off-chip analog-to-digital converter (ADC) references to fine-tune partial sum quantization edges. Though off-chip techniques are viable for testing purposes, they are unsuitable for practical applications. This work presents an RRAM-CIM macro that features 1) on-chip write-verify to speed up initial weight programming and periodically refresh cells to compensate for resistance drift under stress, and 2) on-chip ADC reference generation that provides column-wise tunability to mitigate offsets induced by process variation to guarantee CIFAR-10 accuracy of >90%. The design is taped-out in TSMC N40 RRAM process, and achieves 36.4TOPS/W for 1×1b MAC operations on VGG-8 network.

Macro

Data retention

Process Variation

10.1109/esscirc53450.2021.9567844

Cite

Citations (14)

Compute-in-Memory Architecture

Hongwu Jiang Shanshi Huang Shimeng Yu

10.1007/978-981-97-9314-3_62

Cite

Citations (0)

CIMAT: A Compute-In-Memory Architecture for On-chip Training Based on Transpose SRAM Arrays

IEEE Transactions on Computers (2020)

Hongwu Jiang Xiaochen Peng Shanshi Huang Shimeng Yu

Rapid development in deep neural networks (DNNs) is enabling many intelligent applications. However, on-chip training of DNNs is challenging due to the extensive computation and memory bandwidth requirements. To solve the bottleneck of the memory wall problem, compute-in-memory (CIM) approach exploits the analog computation along the bit line of the memory array thus significantly speeds up the vector-matrix multiplications. So far, most of the CIM-based architectures target at implementing inference engine for offline training only. In this article, we propose CIMAT, a CIM Architecture for Training. At the bitcell level, we design two versions of 7T and 8T transpose SRAM to implement bi-directional vector-to-matrix multiplication that is needed for feedforward (FF) and backprogpagation (BP). Moreover, we design the periphery circuitry, mapping strategy and the data flow for the BP process and weight update to support the on-chip training based on CIM. To further improve training performance, we explore the pipeline optimization of proposed architecture. We utilize the mature and advanced CMOS technology at 7 nm to design the CIMAT architecture with 7T/8T transpose SRAM array that supports bi-directional parallel read. We explore the 8-bit training performance of ImageNet on ResNet-18, showing that 7T-based design can achieve 3.38× higher energy efficiency (~6.02 TOPS/W), 4.34× frame rate (~4,020 fps) and only 50 percent chip size compared to the baseline architecture with conventional 6T SRAM array that supports row-by-row read only. The even better performance is obtained with 8T-based architecture, which can reach ~10.79 TOPS/W and ~48,335 fps with 74-percent chip area compared to the baseline.

Transpose

10.1109/tc.2020.2980533

Cite

Citations (40)

Low-power embedded memory architecture design for SOC

Cheng‐Wen Wu Yue Lin Chun-Yu Su Chen Huang Shanshi Huang

Source

Cite

Citations (0)

A 40nm Analog-Input ADC-Free Compute-in-Memory RRAM Macro with Pulse-Width Modulation between Sub-arrays

2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits) (2022)

Hongwu Jiang Wantong Li Shanshi Huang Shimeng Yu

This paper presents an ADC-free compute-in-memory (CIM) RRAM-based macro, exploiting the fully analog intra-/inter-array computation. The main contributions include: 1) a lightweight input-encoding scheme based on pulse-width modulation (PWM), which improves the compute throughput by ~7 times; 2) a fully analog data processing manner between sub-arrays without explicit ADCs, which does not introduce quantization loss and saves the power by a factor of 11.6. The 40nm prototype chip with TSMC RRAM achieves energy efficiency of 421.53 TOPS/W and compute efficiency of 360 GOPS/mm ² (normalized to binary operation) at 100MHz.

Macro

Modulation (music)

10.1109/vlsitechnologyandcir46769.2022.9830211

Cite

Citations (27)

Compute-in-Memory Architecture

Hongwu Jiang Shanshi Huang Shimeng Yu

In the era of big data and artificial intelligence, hardware advancement in throughput and energy efficiency is essential for both cloud and edge computations. Because of the merged data storage and computing units, compute-in-memory is becoming one of the desirable choices for data-centric applications to mitigate the memory wall bottleneck in von-Neumann architecture. In this chapter, the recent architectural designs and underlying circuit/device technologies for compute-in-memory are surveyed. The related design challenges and prospects are also discussed to provide an in-depth understanding of interactions between algorithms/architectures and circuits/devices. The chapter is organized hierarchically: the overview of the field (Introduction section); the principle of compute-in-memory (section "DNNDeep neural networks (DNNs) Basics and Corresponding CIM Principle"); the latest architecture and algorithm techniques including network model, data flow, pipeline design, and quantization approaches (section "Architecture and Algorithm Techniques for CIM"); the related hardware support including embedded memory technologies such as static random access memories and emerging nonvolatile memories, as well as the peripheral circuit designs with a focus on the analog-to-digital converters (section "Hardware Implementations for CIM Architecture"); a summary and outlook of the compute-in-memory architecture (Conclusion section).

Memory architecture

Database-centric architecture

Dataflow architecture

In-Memory Processing

10.1007/978-981-15-6401-7_62-1

Cite

Citations (0)

Issue Information

European Journal Of Haematology (2018)

Karl‐Anton Kreuzer Alexander Röth Jaroslaw P. Maciejewski Jun‐ichi Nishimura Deepti Jain

12 issues per year.Ins tu onal subscrip on prices for 2018 are: Print & Online: £1634 (UK), €2073 (Europe), $2743 (The Americas), $3197 (Rest of World).Prices are exclusive of tax.Asia-Pacifi c GST,

10.1111/ejh.12949

Cite

Citations (0)

Repair welding of ductile cast iron by laser cladding process: microstructure and mechanical properties

International Journal of Cast Metals Research (2014)

Chun-Ming Lin Anil Chandra Lucía Morales-Rivas Shanshi Huang H.-C. Wu

The aim of this research was to determine the feasibility of a newly developed process in the repair of cracked gas turbine casings made of ductile cast iron. This study investigated the microstructural characteristics, metallurgy and mechanical properties of the repair weldments produced using fibre laser cladding. Optical microscopy, scanning electron microscopy and element probe microanalysis were used to investigate the microstructure at the cladding weld interface. The mechanical properties of the cladded specimens were evaluated after laser cladding. Our results revealed that the weldability of ductile cast iron can be enhanced by performing laser surface pretreatment to sublimate graphite nodules. Microhardness at the interface of the laser cladded weldments depended largely on the range of the heat affected zone and the degree of phase complexity. Under tensile loading, failures were limited to the base metal region of the weldments. Test results demonstrate that the impact toughness of the interface between the fusion zone and the base metal can be enhanced through the application of post-cladding heat treatment.

Weldability

Cladding (metalworking)

Base metal

Ductile iron

10.1179/1743133614y.0000000126

Cite

Citations (20)

New Security Challenges on Machine Learning Inference Engine: Chip Cloning and Model Reverse Engineering

arXiv (Cornell University) (2020)

Shanshi Huang Xiaochen Peng Hongwu Jiang Yandong Luo Shimeng Yu

Machine learning inference engine is of great interest to smart edge computing. Compute-in-memory (CIM) architecture has shown significant improvements in throughput and energy efficiency for hardware acceleration. Emerging non-volatile memory technologies offer great potential for instant on and off by dynamic power gating. Inference engine is typically pre-trained by the cloud and then being deployed to the filed. There are new attack models on chip cloning and neural network model reverse engineering. In this paper, we propose countermeasures to the weight cloning and input-output pair attacks. The first strategy is the weight fine-tune to compensate the analog-to-digital converter (ADC) offset for a specific chip instance while inducing significant accuracy drop for cloned chip instances. The second strategy is the weight shuffle and fake rows insertion to allow accurate propagation of the activations of the neural network only with a key.

Reverse engineering

Cloning (programming)

Source

Cite

Citations (1)