Adaptive and Robust Routing with Lyapunov-Based Deep RL in MEC Networks Enabled by Blockchains

2020 
The most recent development of the Internet of Things brings massive timely sensitive and bursty data flows. Also, joint optimization on storage, computation, and communication is in need for multiaccess edge computing frameworks. The adaptive network control has been explored using deep reinforcement learning (RL), but it is not sufficient for bursty network traffic flows, especially when the network traffic pattern may change over time. We formulate the routing control in an environment with time-variant link delays as a Lyapunov optimization problem. We identify that there is a tradeoff between optimization performance and modeling accuracy when the propagation delays are included. We propose a novel deep RL (DRL)-based adaptive network routing method to tackle the issues mentioned above. A Lyapunov optimization technique is used to reduce the upper bound of the Lyapunov drift, improving queuing stability in networked systems. By modeling the network traffic pattern using the Markovian arrival process, we show that network routing problems can be modeled as Markov decision processes and value-iteration-based RL methods can be used to solve them. We design a blockchain-based protocol using proof of elapsed time consensus mechanism to ensure a trustworthy network statistics information exchange for the routing framework. Experiment results show that the proposed method can learn a routing policy and adapt to the changing environment. The proposed method outperforms the baseline backpressure method in multiple settings and converges faster than existing methods. Moreover, the DRL module can effectively learn a better estimation of the long-term Lyapunov drift and penalty functions, providing superior results in terms of the backlog size, end-to-end latency, age of information, and throughput. Furthermore, the blockchain-based network statistics exchange can provide the routing framework against malicious nodes. In addition, the proposed model performs well under various topologies, and thus can be used in general cases.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    68
    References
    1
    Citations
    NaN
    KQI
    []