Deep Deterministic Policy Gradient (DDPG)-Based Resource Allocation Scheme for NOMA Vehicular Communications

2020 
This paper investigates the resource allocation problem in vehicular communications based on multi-agent Deep Deterministic Policy Gradient (DDPG), in which each Vehicle-to-Vehicle (V2V) communication acts as agent and adopts Non-Orthogonal Multiple Access (NOMA) technology to share the frequency spectrum that pre-allocated to Vehicle-to-Infrastructure (V2I) communications. Different with conventional D2D communications, the fast varying channel condition due to the high mobility in vehicular environment causes the difficulty of collecting instantaneous Channel State Information (CSI) at base station. Meanwhile, one tremendous challenge faced by vehicular communications is how to maximize the sum-rate of V2I communications simultaneously guaranteeing the latency and reliability requirements for the transmission of safety-critical information in V2V communications. In response, we formulate the resource allocation problem as a decentralized Discrete-time and Finite-state Markov Decision Process (DFMDP), in which allocation decisions are made by multiple agents that do not have complete and global network information. Due to the complexity of the problem, we propose a DDPG algorithm which is capable of handling continuous high dimensional action spaces to find the optimal allocation strategy. Numerical results verify that each agent can effectively learn from the environment by means of the proposed DDPG algorithm to maximize the sum-rate of V2I communications while satisfying the stringent latency and reliability constraints of V2V communications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    26
    Citations
    NaN
    KQI
    []