Memory-free Stochastic Weight Averaging by One-way Variational Pruning

Yooseung Wang,Hyunseong Park,Jwajin Lee

Memory-free Stochastic Weight Averaging by One-way Variational Pruning

2021

Recent works on convolutional neural networks (CNN) have attempted to find the local optima with ensemble-based approaches. Fast Geometric Ensemble (FGE) showed that captured weight points at the end of training time circulate local optima. This led to the Stochastic Weight Averaging (SWA) approach, which averages multiple model weights to find the local optima. However, they are limited by their output of fully-parameterized models, including needless parameters, after the training procedure. To solve this problem, we propose a novel training procedure: Stochastic Weight Averaging by One-way Variational Pruning (SWA-OVP). SWA-OVP reduces the number of model parameters by variationally updating the mask of weights for pruning. SWA-OVP variationally generates a mask for pruned weights in each iteration while recent pruning approaches produce the mask at the end of each training. In addition, our SWA-OVP prunes the model in a one-way training procedure, while other recent approaches prune the model weights in iterative training or require additional computation. Our experiment shows that SWA-OVP using only a 0.5x% $\sim$ 0.7x% parameter size achieves even higher accuracy than SWA and FGE on several networks, such as Pre-ResNet110, Pre-ResNet164 and WideResNet28x10 on CIFAR10 and CIFAR100 datasets. SWA-OVP also achieves better performance compared to state-of-the-art pruning approaches.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations