Memory-free Stochastic Weight Averaging by One-way Variational Pruning

2021 
Recent works on convolutional neural networks (CNN) have attempted to find the local optima with ensemble-based approaches. Fast Geometric Ensemble (FGE) showed that captured weight points at the end of training time circulate local optima. This led to the Stochastic Weight Averaging (SWA) approach, which averages multiple model weights to find the local optima. However, they are limited by their output of fully-parameterized models, including needless parameters, after the training procedure. To solve this problem, we propose a novel training procedure: Stochastic Weight Averaging by One-way Variational Pruning (SWA-OVP). SWA-OVP reduces the number of model parameters by variationally updating the mask of weights for pruning. SWA-OVP variationally generates a mask for pruned weights in each iteration while recent pruning approaches produce the mask at the end of each training. In addition, our SWA-OVP prunes the model in a one-way training procedure, while other recent approaches prune the model weights in iterative training or require additional computation. Our experiment shows that SWA-OVP using only a 0.5x% $\sim$ 0.7x% parameter size achieves even higher accuracy than SWA and FGE on several networks, such as Pre-ResNet110, Pre-ResNet164 and WideResNet28x10 on CIFAR10 and CIFAR100 datasets. SWA-OVP also achieves better performance compared to state-of-the-art pruning approaches.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    0
    Citations
    NaN
    KQI
    []