MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

arXiv (Cornell University) (2021)

Yuhang Li Mingzhu Shen Jian Ma Yan Ren Mingxin Zhao Qi Zhang Ruihao Gong Fengwei Yu Junjie Yan

Citation

Reference

Related Paper

Citation Trend

Abstract:

Model quantization has emerged as an indispensable technique to accelerate deep learning inference. While researchers continue to push the frontier of quantization algorithms, existing quantization work is often unreproducible and undeployable. This is because researchers do not choose consistent training pipelines and ignore the requirements for hardware deployments. In this work, we propose Model Quantization Benchmark (MQBench), a first attempt to evaluate, analyze, and benchmark the reproducibility and deployability for model quantization algorithms. We choose multiple different platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms under a unified training pipeline. MQBench acts like a bridge to connect the algorithm and the hardware. We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights. By aligning the training settings, we find existing algorithms have about the same performance on the conventional academic track. While for the hardware-deployable quantization, there is a huge accuracy gap which remains unsettled. Surprisingly, no existing algorithm wins every challenge in MQBench, and we hope this work could inspire future research directions.

Keywords:

Benchmark (surveying)

Application-specific integrated circuit

Topics:

Advanced Neural Network Applications

Parallel Computing and Optimization Techniques

CCD and CMOS Imaging Sensors

10.48550/arxiv.2111.03759

Cite

PDF

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

arXiv (Cornell University) (2021)

Yuhang Li Mingzhu Shen Jian Ma Yan Ren Mingxin Zhao

Benchmark (surveying)

Application-specific integrated circuit

Source

Cite

Citations (0)

デジタル/アナログ混載ASIC設計技術 (ASICアプリケ-ションガイド--ヒット商品を生み出すASIC開発設計ノウハウ ) -- (ASIC設計ノウハウの全て)

Denshi gijutsu (1989)

茂川田

Application-specific integrated circuit

Source

Cite

Citations (0)

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

arXiv (Cornell University) (2021)

Yuhang Li Mingzhu Shen Jian Ma Yan Ren Mingxin Zhao

Benchmark (surveying)

Application-specific integrated circuit

10.48550/arxiv.2111.03759

Cite

Citations (18)

Low-bit Quantization of Neural Networks for Efficient Inference

arXiv (Cornell University) (2019)

Yoni Choukroun Eli Kravchik Fan Yang Pavel Kisilev

Recent machine learning methods use increasingly large deep neural networks to achieve state of the art results in various tasks. The gains in performance come at the cost of a substantial increase in computation and storage requirements. This makes real-time implementations on limited resources hardware a challenging task. One popular approach to address this challenge is to perform low-bit precision computations via neural network quantization. However, aggressive quantization generally entails a severe penalty in terms of accuracy, and often requires retraining of the network, or resorting to higher bit precision quantization. In this paper, we formalize the linear quantization task as a Minimum Mean Squared Error (MMSE) problem for both weights and activations, allowing low-bit precision inference without the need for full network retraining. The main contributions of our approach are the optimizations of the constrained MSE problem at each layer of the network, the hardware aware partitioning of the network parameters, and the use of multiple low precision quantized tensors for poorly approximated layers. The proposed approach allows 4 bits integer (INT4) quantization for deployment of pretrained models on limited hardware resources. Multiple experiments on various network architectures show that the suggested method yields state of the art results with minimal loss of tasks accuracy.

Retraining

Source

Cite

Citations (15)

Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search

2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

Mingzhu Shen Feng Liang Ruihao Gong Yuhang Li Chuming Li

Quantization Neural Networks (QNN) have attracted a lot of attention due to their high efficiency. To enhance the quantization accuracy, prior works mainly focus on designing advanced quantization algorithms but still fail to achieve satisfactory results under the extremely low-bit case. In this work, we take an architecture perspective to investigate the potential of high-performance QNN. Therefore, we propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides. However, a naive combination inevitably faces unacceptable time consumption or unstable training problem. To alleviate these problems, we first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models. Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and meanwhile improves the quantization accuracy. Equipped with this overall framework, dubbed as Once Quantization-Aware Training (OQAT), our searched model family, OQATNets, achieves a new state-of-the-art compared with various architectures under different bit-widths. In particular, OQAT-2bit-M achieves 61.6% ImageNet Top-1 accuracy, outperforming 2-bit counterpart MobileNetV3 by a large margin of 9% with 10% less computation cost. A series of quantization-friendly architectures are identified easily and extensive analysis can be made to summarize the interaction between quantization and neural architectures. Codes and models are released at https://github.com/LaVieEnRoseSMZ/OQA

Linde–Buzo–Gray algorithm

10.1109/iccv48922.2021.00529

Cite

Citations (23)

ASIC Design

Springer eBooks (2021)

Vaibbhav Taraate

Application-specific integrated circuit

Design flow

10.1007/978-981-16-3199-3_18

Cite

Citations (1)

Structured ASIC, evolution or revolution?

Kun-Cheng Wu Yu-Wen Tsai

This paper describes the structured ASIC technology and impacts to the implementation flow. With an optimized and programmable structure, the structured ASIC technology indeed introduces a dramatically reduce ASIC cost and manufacturing turn-around time. While, the structured ASIC implementation flow is more complex than the conventional cell-based flow. There would be slightly impacts to structured ASIC implementation problems. Finally, the structured ASIC solutions provided by Faraday would be given. There are 3 structured ASIC solutions for customers' different applications. The three solutions are MPCA (Metal programmable Cell Array), MPIO (Metal Programmable I/O), and the structured ASIC platform. With the most competitive architecture, our customers can implement their ASIC at a lower cost with a faster turn-around-time.

Application-specific integrated circuit

10.1145/981066.981088

Cite

Citations (79)

<title>Testing of application specific integrated circuit in LabVIEW environment</title>

Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE (2006)

P. Maj

The process of implementation of a new Application Specific Integrated Circuit (ASIC) consists of several stages like designing, production and testing. To verify, if the ASIC fulfills all project specifications, sets of special tests are necessary, especially in the case of new scientific experiments in physics, material science or biology. Nowadays mostly the PC computers with dedicated hardware can communicate with the ASIC and perform those tests. Every new ASIC requires it's own dedicated software that performs specific communication protocol and contains many characteristic functions to test the parameters of the ASIC. The time for testing the new ASIC can be significantly shorten if the program blocks are properly written, i.e. there is a possibility to change the functionality of those blocks instantly to make some additional tests. In this article, we show an example of testing a 64-channel ASIC called DEDIX, which is used for fast digital X-ray imaging applications. The ASIC has been fabricated in 0.35 μm CMOS technology. Using a special software developed in LabVIEW environment, we tested analogue parameters of the front-end channels (gain, noise, offsets), the parameters of digital to analogue converters as well as functionality and speed of digital blocks. We concentrate on these programming techniques, which considerably speed up the developing of dedicated software for testing the new ASIC.

10.1117/12.714534

Cite

Citations (0)

Physical design tradeoffs for ASIC technologies

J. Banker Anil Shanbhag Naveed A. Sherwani

Application Specific Integrated Circuit (ASIC) is a very broad definition and excludes only general-purpose processing, memory chips and circuits built using the standard building block chips. ASIC technology is cost effective that meets the challenges of today's complex designs. The physical design tradeoffs for ASIC technologies are discussed. The tradeoffs for ASICs include tradeoffs between ASIC technologies and the physical design tradeoffs. The focus is on the physical design tradeoffs for ASICs, but, as technology tradeoffs and physical design tradeoffs go hand-in-hand, the tradeoffs between ASIC technologies are briefly discussed.< >

Application-specific integrated circuit

Integrated circuit design

10.1109/asic.1993.410811

Cite

Citations (4)

Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search

arXiv (Cornell University) (2020)

Mingzhu Shen Feng Liang Ruihao Gong Yuhang Li Chuming Li

Quantization Neural Networks (QNN) have attracted a lot of attention due to their high efficiency. To enhance the quantization accuracy, prior works mainly focus on designing advanced quantization algorithms but still fail to achieve satisfactory results under the extremely low-bit case. In this work, we take an architecture perspective to investigate the potential of high-performance QNN. Therefore, we propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides. However, a naive combination inevitably faces unacceptable time consumption or unstable training problem. To alleviate these problems, we first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models. Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and meanwhile improves the quantization accuracy. Equipped with this overall framework, dubbed as Once Quantization-Aware Training~(OQAT), our searched model family, OQATNets, achieves a new state-of-the-art compared with various architectures under different bit-widths. In particular, OQAT-2bit-M achieves 61.6% ImageNet Top-1 accuracy, outperforming 2-bit counterpart MobileNetV3 by a large margin of 9% with 10% less computation cost. A series of quantization-friendly architectures are identified easily and extensive analysis can be made to summarize the interaction between quantization and neural architectures. Codes and models are released at https://github.com/LaVieEnRoseSMZ/OQA

10.48550/arxiv.2010.04354

Cite

Citations (0)