Learning Algorithms for Minimizing Queue Length Regret

2018 
We consider a system consisting of a single transmitter and $N$ channels. Packets randomly arrive to the transmitter's queue, and at each time slot a controller can schedule one of the $N$ channels for transmission. The channel's rates are time-varying with unknown statistics and must be learned through observation. Our objective is to minimize the number of packets in the system's queue over $T$ time slots. We define the regret of the system to be the expected difference between the total queue length of a controller that must learn the channels' average rates and a controller that knows the rates, a priori. One approach to solving this problem would be to apply algorithms from the literature that were developed to solve the closely-related stochastic multi-armed bandit problem. However, we show that these methods have $\Omega(\log(T))$ queue length regret. On the other hand, we show that there exists a set of queue-length based policies that are able to obtain order optimal, $O(1)$ , regret.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    11
    Citations
    NaN
    KQI
    []