Efficient pipelined execution of CNNs based on in-memory computing and graph homomorphism verification

Martino Dazzi,Abu Sebastian,Thomas Parnell,Pier Andrea Francese,Luca Benini,Evangelos Eleftheriou

Efficient pipelined execution of CNNs based on in-memory computing and graph homomorphism verification

2021

In-memory computing is an emerging computing paradigm enabling deep-learning inference at significantly higher energy-efficiency and reduced latency. The essential idea is mapping the synaptic weights of each layer to one or more in-memory computing (ICM) cores. During inference, these cores perform the associated matrix-vector multiplications in place with O(1) time complexity, obviating the need to move the synaptic weights to additional processing units. Moreover, this architecture enables the execution of these networks in a highly pipelined fashion. However, a key challenge is designing an efficient communication fabric for the CM cores. In this work, we present one such communication fabric based on a graph topology that is well-suited for the widely successful convolutional neural networks (CNNs). We show that this communication fabric facilitates the pipelined execution of all state-of-the-art CNNs by proving the existence of a homomorphism between the graph representations of these networks and that corresponding to the proposed communication fabric. We then present a quantitative comparison with established communication topologies and show that our proposed topology achieves the lowest bandwidth requirements per communication channel. Finally, we present one hardware implementation and show a concrete example of mapping ResNet-32 onto an CM cores array interconnected via the proposed communication fabric.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations