Distributed Deep Reinforcement Learning with Wideband Sensing for Dynamic Spectrum Access

2020 
Dynamic Spectrum Access (DSA) improves spectrum utilization by allowing secondary users (SUs) to opportunistically access temporary idle periods in the primary user (PU) channels. Previous studies on utility maximizing spectrum access strategies mostly require complete network state information, therefore, may not be practical. Model-free reinforcement learning (RL) based methods, such as Q-learning, on the other hand, are promising adaptive solutions that do not require complete network information. In this paper, we tackle this research dilemma and propose deep Q-learning originated spectrum access (DQLS) based decentralized and centralized channel selection methods for network utility maximization, namely DEcentralized Spectrum Allocation (DESA) and Centralized Spectrum Allocation (CSA), respectively. Actions that are generated through centralized deep Q-network (DQN) are utilized in CSA whereas the DESA adopts a non-cooperative approach in spectrum decisions. We use extensive simulations to investigate spectrum utilization of our proposed methods for varying primary and secondary network sizes. Our findings demonstrate that proposed schemes outperform model-based RL and traditional approaches, including slotted-Aloha and Whittle index policy, while % 87 of optimal channel access is achieved.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []