Contextual Bandits with Linear Payoff Functions

Wei Chu,Lihong Li,Lev Reyzin,Robert Schapire

Contextual Bandits with Linear Payoff Functions

2011

Wei Chu
Lihong Li
Lev Reyzin
Robert Schapire

In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For T rounds, K actions, and d dimensional feature vectors, we prove an O(\sqrtTd\ln^3(KT\ln(T)/δ)) regret bound that holds with probability 1-δfor the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of Ω(\sqrtTd) for this setting, matching the upper bound up to logarithmic factors. [pdf]

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations