Dithering for Learning: Computationally Efficient Policies for Dynamic Pricing in High Dimensions

2019 
We consider a dynamic pricing and learning problem where a seller prices multiple products and learns from sales data about unknown demand. We propose dithering policies---namely, policies under which prices are probabilistically selected in a neighborhood surrounding the myopic optimal price---that achieve good theoretical performance and are computationally efficient. In particular, by analyzing structural properties of price-dithering, we establish a regret bound of order √T logT. Moreover, we prove under an additional assumption that our policy can be modified to achieve a constant regret bound. With regard to computation, we show that our dithering policies are as efficient as the widely-used myopic policy, yet avoid the problem of incomplete learning that can be encountered under the myopic policy. Dithering, which can be viewed as a "soft" or probabilistic avoidance of incomplete learning, is also shown to be substantially more computationally efficient than existing "discriminative" policies which impose (typically non-convex) hard constraints on the price space and can often be prohibitively time consuming to solve at the optimization stage, particularly in high dimensions.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []