Learning Algorithms for Minimizing Queue Length Regret

Thomas Stahlbuhk,Brooke Shrader,Eytan Modiano

Learning Algorithms for Minimizing Queue Length Regret

2018

We consider a system consisting of a single transmitter and $N$ channels. Packets randomly arrive to the transmitter's queue, and at each time slot a controller can schedule one of the $N$ channels for transmission. The channel's rates are time-varying with unknown statistics and must be learned through observation. Our objective is to minimize the number of packets in the system's queue over $T$ time slots. We define the regret of the system to be the expected difference between the total queue length of a controller that must learn the channels' average rates and a controller that knows the rates, a priori. One approach to solving this problem would be to apply algorithms from the literature that were developed to solve the closely-related stochastic multi-armed bandit problem. However, we show that these methods have $\Omega(\log(T))$ queue length regret. On the other hand, we show that there exists a set of queue-length based policies that are able to obtain order optimal, $O(1)$ , regret.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations