A Flexible In-Memory Computing Architecture for Heterogeneously Quantized CNNs

Flavio Ponzina,Marco Rios,Giovanni Ansaloni,Alexandre Levisse,David Atienza

A Flexible In-Memory Computing Architecture for Heterogeneously Quantized CNNs

2021

Inferences using Convolutional Neural Networks (CNNs) are resource and energy intensive. Therefore, their execution on highly constrained edge devices demands the careful co-optimization of algorithms and hardware. Addressing this challenge, in this paper we present a flexible In-Memory Computing (IMC) architecture and circuit, able to scale data representations to varying bitwidths at run-time, while ensuring high level of parallelism and requiring low area. Moreover, we introduce a novel optimization heuristic, which tailors the quantization level in each CNN layer according to workloads and robustness considerations. We investigate the performance, accuracy and energy requirements of our co-design approach on CNNs of varying sizes, obtaining up to 76.2% increases in efficiency and up to 75.6% reductions in run-time with respect to fixed-bitwidth alternatives, for negligible accuracy degradation.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations