Characterizing the I/O Pipeline in the Deployment of CNNs on Commercial Accelerators

2020 
Commercial AI accelerators are gaining popularity because of their high energy efficiency for the inference of deep neural networks (DNNs). How to benchmark them remains hot research topic. Existing characterizing researches mainly focus on the hardware execution latency, without considering the pre- or post-processing and the copy-in or copy-out overhead, which is important during the end-to-end inference. Motivated by this, we model the end-to-end inference phase of commercial DNN accelerators with a five-stage I/O pipeline, including preprocessing, copy-in, hardware execution, copy-out, and postprocessing stages. We further investigate the involving factors in each stage and implement the whole I/O pipeline on three different hardware platforms, covering the emerging DNN accelerators and the traditional CPUs and GPUs. With six DNNs from real-world computer vision applications, we take a deep dive into the DNN inference pipeline and quantify the influence of each stage on the throughput. Our experimental results demonstrate the effect of this I/O pipeline and highlight the necessity of end-to-end evaluation. Moreover, the I/O pipeline we implemented is a flexible benchmarking tool, which can help users conduct an in-depth end-to-end evaluation of DNN inference phase.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    0
    Citations
    NaN
    KQI
    []