Enabling long-range exploration in minimization of multimodal functions

2021 
We consider the problem of minimizing multi-modal loss functions with a large number of local optima. Since the local gradient points to the direction of the steepest slope in an infinitesimal neighborhood, an optimizer guided by the local gradient is often trapped in local optima. To address this issue, we develop a nonlocal gradient to skip small local optima and capture major structures of the loss’s landscape in black-box optimization. The nonlocal gradient is defined by a directional Gaussian smoothing (DGS) approach, so we refer to our nonlocal gradient as DGS gradient. The key idea of DGS is to conducts 1D nonlocal exploration with a large radius along $d$ orthogonal directions in $R^d$, each of which defines a nonlocal directional derivative as a 1D integral. Such long-range exploration enables the DGS gradient to skip small local optima. The $d$ directional derivatives are then assembled to form the nonlocal gradient. We use the Gauss-Hermite (GH) quadrature rule to approximate the $d$ 1D integrals to obtain an accurate estimator. The superior performance of our method is demonstrated in three sets of examples, including benchmark functions for global optimization, and two real-world scientific problems.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    1
    Citations
    NaN
    KQI
    []