Submodular Importance Sampling for Neural NetworkTraining
2018
Stochastic Gradient Descent(SGD) algorithms are the workhorse on which Deep Learning systems have been built
upon. The standard approach of uniform sampling in SGD algorithm leads to high variance between the calculated
gradient and the true gradient, consequently resulting in longer training times.
Importance sampling methods are used for sampling mini-batches that reduce this variance. There exist provable
importance sampling techniques for variance reduction but,they generally do not fare well in the case of Deep
Learning models.
Our work proposes sampling strategies that create diverse mini-batches which consequently leads to the reduction
in the variance of the SGD algorithm. We pose the task of creation of such mini-batches as, maximization of a
submodular objective function. The proposed submodular objective function samples minibatches that such that
more uncertain and diverse set of samples are selected with high probability.
Submodular functions can be optimized easily using the GREEDY[1] algorithm but, even the newer variants
suffer from performance issues when the size of the dataset is large. We propose a new faster submodular optimization
method method which is inspired from [2].
We prove theoretically that our sampling scheme reduces variance in the case of SGD algorithm. We also show
that Determinantal point process(DPP) sampling can also be seen as a special case of our algorithm.
We showcase the generalization of our method by testing it on several deep learning data sets like MNIST,FMNIST,
CIFAR-10 datasets. We study the effect of learning rate, network architecture etc on our proposed method.We
study how different features affect the performance of our algorithm. We also study the case of transfer learning
with our algorithm used for selection of the dataset. In all the experiments, we compare our algorithm with Loss
based sampling and Random sampling for comparison.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI