Inference replication at edges via combinatorial multi-armed bandit

2022 
Inferences easily incur computation overload at the edge of the network, since they often consume plenty of resources and are often implemented by using deep neural networks (DNNs). Traditional approach via offloading those inference tasks to remote cloud is unsuitable, since the round-trip time is often a burden. As a result, offloading by using nearby idle edges is promising, which sacrifices the task replication overhead for speeding up the edge inference. Unfortunately, due to stochastic changes on both edge networks and edge inference, it is hard to determine the best suitable targets for replication, especially when those DNNs consist of multiple kernels for inference, the replication decision involves multiple edge candidates as the destinations, and the edges are further heterogeneous. In this paper, we propose to optimize the inference replication at edges, under the consideration of stochastic changes. We formulate related problem and design an online algorithm via combinatorial multi-armed bandit for the inference with minimum response time, which decides multiple destinations simultaneously for replication, upon both revealed feedback after the deployment and the offline profile. By rigorous proof, the sublinear regret is ensured, which measures the gap between our online decision and the offline optimum. Through extensive trace-driven experiments with Huawei Atlas and NVIDIA Jetson, the improvement earned by inference replication is confirmed, compared with other alternatives.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []