Gradient Preconditioned Mini-batch SGD for Ridge Regression

2020 
Abstract Data preconditioning technique, which reduces the condition number of the problem by a linear transformation of the data matrix, is typically used to accelerate the convergence of the first-order optimization methods for regularized loss minimization. One obvious limitation of the technique is exceedingly expensive of computational cost for the large-scale problems, especially an ocean of samples. In this paper, we have a gradient preconditioning trick and combine it with mini-batch SGD. The proposed gradient preconditioned mini-batch SGD algorithm boosts indeed the convergence with lower computational cost than that of the data preconditioning technique for ridge regression. Concretely, we use recent random projection and linear sketching methods to randomly low rank approximate the data matrix, then we can achieve a appropriate preconditioner through numerical linear algebra. Finally, we apply obtained preconditioner to the gradient to reduce computational cost. The experimental results on both synthetic data and real data sets validate the feasibility and effectiveness of our trick and algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    4
    Citations
    NaN
    KQI
    []