Non-stationary Continuum-armed Bandits for Online Hyperparameter Optimization

2022 
For years, machine learning has become the dominant approach to a variety of information retrieval tasks. The performance of machine learning algorithms heavily depends on their hyperparameters. It is hence critical to identity the optimal hyperparameter configuration when applying machine learning algorithms. Most of existing hyperparameter optimization methods assume a static relationship between hyperparameter configuration and algorithmic performance and are thus not suitable for many information retrieval applications with non-stationary environments such as e-commerce recommendation and online advertising. To address this limitation, we study online hyperparameter optimization, where the hyperparameter configuration is optimized on the fly. We formulate online hyperparameter optimization as a non-stationary continuum-armed bandits problem in which each arm corresponds to a hyperparameter configuration and the algorithmic performance is viewed as reward. For this problem, we develop principled methods with strong theoretical guarantees in terms of dynamic regret. The key idea is to adaptively discretize the continuous arm set and estimate the mean reward of each arm via weighted averaging. As a case application, we show how our methods can be applied to optimize the hyperparameter of vector-based candidate generation algorithm and empirically demonstrate the effectiveness and efficiency of our methods on public advertising dataset and online A/B testing. Furthermore, to the best of our knowledge, our methods are the first to achieve sub-linear dynamic regret bounds for continuum-armed bandits, which may be of independent interest.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    0
    Citations
    NaN
    KQI
    []