Reinforcement Learning Based VNF Scheduling with End-to-End Delay Guarantee

2019 
Network slicing has been recognized as a promising technology to achieve service customization for supporting various applications in fifth-generation (5G) networks. As one of its key enablers, network function virtualization (NFV) holds great potential to reduce service provisioning cost and improve resource utilization. With NFV, a service can be implemented by chaining the required virtual network functions (VNFs). In this paper, we study the scheduling of the VNFs to minimize makespan (i.e., overall completion time) of all services, while satisfying their diverse end-to-end (E2E) delay requirements. The problem is formulated as a mixed integer linear program (MILP), which is NP-hard. To address the NP-hardness of the MILP with high efficiency and high accuracy, we model the problem as a Markov decision process (MDP) with variable action sets and leverage a reinforcement learning (RL) algorithm to find its optimal scheduling policy. A Q-learning based algorithm is developed to address the challenges of variable action sets and varying action execution time of the MDP. A specific reward function is designed to realize delay-guaranteed VNF scheduling. Simulation results are provided, showing that the proposed approach outperforms the benchmark heuristic algorithms and can achieve near-optimal performance in terms of the makespan.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    5
    Citations
    NaN
    KQI
    []