Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling.

Kaito Ariu,Masahiro Kato,Junpei Komiyama,Kenichiro McAlinn,Chao Qin

Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling.

2021

Kaito Ariu
Masahiro Kato
Junpei Komiyama
Kenichiro McAlinn
Chao Qin

We consider the "policy choice" problem -- otherwise known as best arm identification in the bandit literature -- proposed by Kasy and Sautmann (2021) for adaptive experimental design. Theorem 1 of Kasy and Sautmann (2021) provides three asymptotic results that give theoretical guarantees for exploration sampling developed for this setting. We first show that the proof of Theorem 1 (1) has technical issues, and the proof and statement of Theorem 1 (2) are incorrect. We then show, through a counterexample, that Theorem 1 (3) is false. For the former two, we correct the statements and provide rigorous proofs. For Theorem 1 (3), we propose an alternative objective function, which we call posterior weighted policy regret, and derive the asymptotic optimality of exploration sampling.

Keywords:

Counterexample
Calculus
Mathematical proof
Identification (information)
statement
Asymptotic analysis
Computer science
Regret
Sampling (statistics)

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations