logo
    MQBench: Towards Reproducible and Deployable Model Quantization Benchmark
    18
    Citation
    0
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Model quantization has emerged as an indispensable technique to accelerate deep learning inference. While researchers continue to push the frontier of quantization algorithms, existing quantization work is often unreproducible and undeployable. This is because researchers do not choose consistent training pipelines and ignore the requirements for hardware deployments. In this work, we propose Model Quantization Benchmark (MQBench), a first attempt to evaluate, analyze, and benchmark the reproducibility and deployability for model quantization algorithms. We choose multiple different platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms under a unified training pipeline. MQBench acts like a bridge to connect the algorithm and the hardware. We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights. By aligning the training settings, we find existing algorithms have about the same performance on the conventional academic track. While for the hardware-deployable quantization, there is a huge accuracy gap which remains unsettled. Surprisingly, no existing algorithm wins every challenge in MQBench, and we hope this work could inspire future research directions.
    Keywords:
    Benchmark (surveying)
    Application-specific integrated circuit
    Model quantization has emerged as an indispensable technique to accelerate deep learning inference. While researchers continue to push the frontier of quantization algorithms, existing quantization work is often unreproducible and undeployable. This is because researchers do not choose consistent training pipelines and ignore the requirements for hardware deployments. In this work, we propose Model Quantization Benchmark (MQBench), a first attempt to evaluate, analyze, and benchmark the reproducibility and deployability for model quantization algorithms. We choose multiple different platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms under a unified training pipeline. MQBench acts like a bridge to connect the algorithm and the hardware. We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights. By aligning the training settings, we find existing algorithms have about the same performance on the conventional academic track. While for the hardware-deployable quantization, there is a huge accuracy gap which remains unsettled. Surprisingly, no existing algorithm wins every challenge in MQBench, and we hope this work could inspire future research directions.
    Benchmark (surveying)
    Application-specific integrated circuit
    Citations (0)
    Model quantization has emerged as an indispensable technique to accelerate deep learning inference. While researchers continue to push the frontier of quantization algorithms, existing quantization work is often unreproducible and undeployable. This is because researchers do not choose consistent training pipelines and ignore the requirements for hardware deployments. In this work, we propose Model Quantization Benchmark (MQBench), a first attempt to evaluate, analyze, and benchmark the reproducibility and deployability for model quantization algorithms. We choose multiple different platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms under a unified training pipeline. MQBench acts like a bridge to connect the algorithm and the hardware. We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights. By aligning the training settings, we find existing algorithms have about the same performance on the conventional academic track. While for the hardware-deployable quantization, there is a huge accuracy gap which remains unsettled. Surprisingly, no existing algorithm wins every challenge in MQBench, and we hope this work could inspire future research directions.
    Benchmark (surveying)
    Application-specific integrated circuit
    Citations (18)
    Recent machine learning methods use increasingly large deep neural networks to achieve state of the art results in various tasks. The gains in performance come at the cost of a substantial increase in computation and storage requirements. This makes real-time implementations on limited resources hardware a challenging task. One popular approach to address this challenge is to perform low-bit precision computations via neural network quantization. However, aggressive quantization generally entails a severe penalty in terms of accuracy, and often requires retraining of the network, or resorting to higher bit precision quantization. In this paper, we formalize the linear quantization task as a Minimum Mean Squared Error (MMSE) problem for both weights and activations, allowing low-bit precision inference without the need for full network retraining. The main contributions of our approach are the optimizations of the constrained MSE problem at each layer of the network, the hardware aware partitioning of the network parameters, and the use of multiple low precision quantized tensors for poorly approximated layers. The proposed approach allows 4 bits integer (INT4) quantization for deployment of pretrained models on limited hardware resources. Multiple experiments on various network architectures show that the suggested method yields state of the art results with minimal loss of tasks accuracy.
    Retraining
    Citations (15)
    Quantization Neural Networks (QNN) have attracted a lot of attention due to their high efficiency. To enhance the quantization accuracy, prior works mainly focus on designing advanced quantization algorithms but still fail to achieve satisfactory results under the extremely low-bit case. In this work, we take an architecture perspective to investigate the potential of high-performance QNN. Therefore, we propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides. However, a naive combination inevitably faces unacceptable time consumption or unstable training problem. To alleviate these problems, we first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models. Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and meanwhile improves the quantization accuracy. Equipped with this overall framework, dubbed as Once Quantization-Aware Training (OQAT), our searched model family, OQATNets, achieves a new state-of-the-art compared with various architectures under different bit-widths. In particular, OQAT-2bit-M achieves 61.6% ImageNet Top-1 accuracy, outperforming 2-bit counterpart MobileNetV3 by a large margin of 9% with 10% less computation cost. A series of quantization-friendly architectures are identified easily and extensive analysis can be made to summarize the interaction between quantization and neural architectures. Codes and models are released at https://github.com/LaVieEnRoseSMZ/OQA
    Linde–Buzo–Gray algorithm
    Application-specific integrated circuit
    Design flow
    This paper describes the structured ASIC technology and impacts to the implementation flow. With an optimized and programmable structure, the structured ASIC technology indeed introduces a dramatically reduce ASIC cost and manufacturing turn-around time. While, the structured ASIC implementation flow is more complex than the conventional cell-based flow. There would be slightly impacts to structured ASIC implementation problems. Finally, the structured ASIC solutions provided by Faraday would be given. There are 3 structured ASIC solutions for customers' different applications. The three solutions are MPCA (Metal programmable Cell Array), MPIO (Metal Programmable I/O), and the structured ASIC platform. With the most competitive architecture, our customers can implement their ASIC at a lower cost with a faster turn-around-time.
    Application-specific integrated circuit
    Citations (79)
    The process of implementation of a new Application Specific Integrated Circuit (ASIC) consists of several stages like designing, production and testing. To verify, if the ASIC fulfills all project specifications, sets of special tests are necessary, especially in the case of new scientific experiments in physics, material science or biology. Nowadays mostly the PC computers with dedicated hardware can communicate with the ASIC and perform those tests. Every new ASIC requires it's own dedicated software that performs specific communication protocol and contains many characteristic functions to test the parameters of the ASIC. The time for testing the new ASIC can be significantly shorten if the program blocks are properly written, i.e. there is a possibility to change the functionality of those blocks instantly to make some additional tests. In this article, we show an example of testing a 64-channel ASIC called DEDIX, which is used for fast digital X-ray imaging applications. The ASIC has been fabricated in 0.35 μm CMOS technology. Using a special software developed in LabVIEW environment, we tested analogue parameters of the front-end channels (gain, noise, offsets), the parameters of digital to analogue converters as well as functionality and speed of digital blocks. We concentrate on these programming techniques, which considerably speed up the developing of dedicated software for testing the new ASIC.
    Citations (0)
    Application Specific Integrated Circuit (ASIC) is a very broad definition and excludes only general-purpose processing, memory chips and circuits built using the standard building block chips. ASIC technology is cost effective that meets the challenges of today's complex designs. The physical design tradeoffs for ASIC technologies are discussed. The tradeoffs for ASICs include tradeoffs between ASIC technologies and the physical design tradeoffs. The focus is on the physical design tradeoffs for ASICs, but, as technology tradeoffs and physical design tradeoffs go hand-in-hand, the tradeoffs between ASIC technologies are briefly discussed.< >
    Application-specific integrated circuit
    Integrated circuit design
    Citations (4)
    Quantization Neural Networks (QNN) have attracted a lot of attention due to their high efficiency. To enhance the quantization accuracy, prior works mainly focus on designing advanced quantization algorithms but still fail to achieve satisfactory results under the extremely low-bit case. In this work, we take an architecture perspective to investigate the potential of high-performance QNN. Therefore, we propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides. However, a naive combination inevitably faces unacceptable time consumption or unstable training problem. To alleviate these problems, we first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models. Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and meanwhile improves the quantization accuracy. Equipped with this overall framework, dubbed as Once Quantization-Aware Training~(OQAT), our searched model family, OQATNets, achieves a new state-of-the-art compared with various architectures under different bit-widths. In particular, OQAT-2bit-M achieves 61.6% ImageNet Top-1 accuracy, outperforming 2-bit counterpart MobileNetV3 by a large margin of 9% with 10% less computation cost. A series of quantization-friendly architectures are identified easily and extensive analysis can be made to summarize the interaction between quantization and neural architectures. Codes and models are released at https://github.com/LaVieEnRoseSMZ/OQA
    Citations (0)