Optimal distributions of rewards for a two-armed slot machine

2023 
In this paper we consider the continuous time two-armed bandit (TAB) problem where the slot machine has two different arms in the sense that the two arms have different expected rewards and variances. We explore the optimal distribution of rewards for two-armed bandit problems, and obtain the explicit distribution function as well as the searching rules of optimal strategy. As a by-product, we find two new counter-intuitive phenomena in nonlinear probability framework (optimal strategic framework). The first is that the combination of losing arm and winning arm can make the winning arm achieve a greater coverage probability to win expected reward, which is also referred to “good + bad = better”. The discovery implies that the traditional advice of always pursuing the arm with larger expected reward (i.e., stay on a winner rule) is not optimal in the probability framework. The second is that the combination sequence out of two independent and normal distribution-based arms is not normally distributed if the two arms are different, which is straightforward understood as “mutually independent normal + normal = unnormal”. Furthermore, we provide the optimal sequential strategy to construct the “combination” arm and numerically examine the underlying mechanism.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []