CoIn: Accelerated CNN Co-Inference through data partitioning on heterogeneous devices

Vanishree K,Anu George,Srivatsav Gunisetty,Srinivasan Subramanian,Shravan Kashyap R,Madhura Purnaprajna

CoIn: Accelerated CNN Co-Inference through data partitioning on heterogeneous devices

2020

In Convolutional Neural Networks (CNN), the need for low inference time per batch is crucial for real-time applications. To improve the inference time, we present a method (CoIn) that benefits from the use of multiple devices that execute simultaneously. Our method achieves the goal of low inference time by partitioning images of a batch on diverse micro-architectures. The strategy for partitioning is based on offline profiling on the target devices. We have validated our partitioning technique on CPUs, GPUs and FPGAs that include memory-constrained devices in which case, a re-partitioning technique is applied. An average speedup of 1.39x and 1.5x is seen with CPU-GPU and CPU-GPU-FPGA co-execution respectively. In comparison with the approach of the state-of-the-art, CoIn has an average speedup of 1.62x across all networks.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations