Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling under Posterior Weighted Policy Regret

Kaito Ariu,Masahiro Kato,Junpei Komiyama,Kenichiro McAlinn,Chao Qin

Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling under Posterior Weighted Policy Regret

2021

Kaito Ariu
Masahiro Kato
Junpei Komiyama
Kenichiro McAlinn
Chao Qin

We consider the "policy choice" problem -- otherwise known as best arm identification in the bandit literature -- proposed by Kasy and Sautmann (2021) for adaptive experimental design. Theorem 1 of Kasy and Sautmann (2021) provides three asymptotic results that give theoretical guarantees for exploration sampling developed for this setting. We first show that the proof of Theorem 1 (1) has technical issues, and the proof and statement of Theorem 1 (2) are incorrect. We then show, through a counterexample, that Theorem 1 (3) is false. For the former two, we correct the statements and provide rigorous proofs. For Theorem 1 (3), we propose an alternative objective function, which we call posterior weighted policy regret, and derive its asymptotic optimality.

Keywords:

Identification (information)
statement
Sampling (statistics)
Regret
Mathematics
Counterexample
Calculus
Asymptotic analysis
Mathematical proof

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations