Chase or Wait: Dynamic UAV Deployment to Learn and Catch Time-Varying User Activities

2021 
Unmanned aerial vehicle (UAV) technology is a promising solution for rapidly providing wireless communication services to ground users. When the users demands dynamically change over time, the key challenge is how to adapt the UAV deployment strategy to the partial and even outdated observations on the users' activities given the UAV's flying speed limit. In this paper, we study dynamic UAV deployment to learn and adapt to the time-varying user activities, where the activity pattern of a user (if out of the UAV service coverage) is hidden from the UAV and follows a time-slotted Markov chain that switches between active and idle states. We formulate the learning-and-adaption based UAV deployment problem as a partially observable Markov decision process (POMDP) to maximize the total discounted hit rate of active users. We show there is a fundamental delay-reward tradeoff, and prove that the UAV will optimally follow a threshold-based policy by waiting at an idle user for a time threshold before moving to another user. Furthermore, we extend to a more general scenario where the UAV does not even know the parameters of each user's temporal activity distribution, and apply Q-learning to develop another threshold-based deployment policy for a multi-user scenario.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    3
    Citations
    NaN
    KQI
    []