Computational Assessment of the Anderson and Nesterov acceleration methods for large scale proximal gradient problems

2021 
Proximal gradient (PG) algorithms target optimization problems composed by the sum of two convex functions, i.e. $F=f+g$ , such that $\nabla f$ is $L$ -Lipschitz continuous and $g$ is possibly nonsmooth. Accelerated PG, which uses past information in order to speed-up PG's original rate of convergence (RoC), are of particular practical interest since they are guaranteed to, at least, achieve $\mathcal{O}(k^{-2})$ . While there exist several alternatives, arguably, Nesterov's acceleration is the de-facto method. However in the recent years, the Anderson acceleration, a well-established technique, which has also been recently adapted for PG, has gained a lot of attention due to its simplicity and practical speed-up w.r.t. Nesterov's method for small to medium scale (number of variables) problems. In this paper we mainly focus on carrying out a computational (Python based) assessment between the Anderson and Nesterov acceleration methods for large scale optimization problems. The computational evidence from our practical experiments, which particularly target Convolutional Sparse Representations, agrees with our theoretical analysis: the extra burden (both in memory and computations) associated with the Anderson acceleration imposes a practical limit, thus giving the Nesterov's method a clear edge for large scale problems.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    0
    Citations
    NaN
    KQI
    []