Dithering for Learning: Computationally Efficient Policies for Dynamic Pricing in High Dimensions
2019
We consider a dynamic pricing and learning problem where a seller prices multiple products and learns from sales data about unknown demand. We propose dithering policies---namely, policies under which prices are probabilistically selected in a neighborhood surrounding the myopic optimal price---that achieve good theoretical performance and are computationally efficient. In particular, by analyzing structural properties of price-dithering, we establish a regret bound of order √T logT. Moreover, we prove under an additional assumption that our policy can be modified to achieve a constant regret bound. With regard to computation, we show that our dithering policies are as efficient as the widely-used myopic policy, yet avoid the problem of incomplete learning that can be encountered under the myopic policy. Dithering, which can be viewed as a "soft" or probabilistic avoidance of incomplete learning, is also shown to be substantially more computationally efficient than existing "discriminative" policies which impose (typically non-convex) hard constraints on the price space and can often be prohibitively time consuming to solve at the optimization stage, particularly in high dimensions.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI