Embedded Deep Inference in Practice: Case for Model Partitioning

2019 
With increased focus on in situ analytics, artificial intelligence (AI) algorithms are getting deployed on embedded devices at the network edge. Growing popularity of Deep Learning (DL) and inference largely due to minimization of feature engineering, availability of pre-trained models and fine-tunable datasets especially in image and video analytics, have made these de-facto standard. However, the embedded systems employing these models are often resource constrained and fail to handle scenarios where arrival rate and input data volume increase over a given time period. This has a direct effect on the storage and network usage of such devices, rendering the traditional strategies of input buffering and network offloading ineffective. This paper investigates the use of dynamic layer-wise partitioning and partial execution of DL inference phase to enable inelastic embedded systems to support varying sensing rates and large data volume. The proposed partial execution scheme and partitioning algorithm perform better than standard frame-wise inference methods, when evaluated using workloads of few popular CNNs used in standard object detection models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    2
    Citations
    NaN
    KQI
    []