On interruptible pure exploration in multi-armed bandits

Alexander Shleyfman,Antonín Komenda,Carmel Domshlak

On interruptible pure exploration in multi-armed bandits

2015

Alexander Shleyfman
Antonín Komenda
Carmel Domshlak

Interruptible pure exploration in multi-armed bandits (MABs) is a key component of Monte-Carlo tree search algorithms for sequential decision problems. We introduce Discriminative Bucketing (DB), a novel family of strategies for pure exploration in MABs, which allows for adapting recent advances in non-interruptible strategies to the interruptible setting, while guaranteeing exponential-rate performance improvement over time. Our experimental evaluation demonstrates that the corresponding instances of DB favorably compete both with the currently popular strategies UCB1 and e-Greedy, as well as with the conservative uniform sampling.

Keywords:

Computer science
Multi-armed bandit
Mathematical optimization
Artificial intelligence
Machine learning
Discriminative model
Decision problem
Performance improvement
Search algorithm
Monte Carlo method
Sampling (statistics)
sequential decision

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations