Submodular Importance Sampling for Neural NetworkTraining

Krishna Kant Singh,Vineeth N Balasubramanian

Submodular Importance Sampling for Neural NetworkTraining

2018

Stochastic Gradient Descent(SGD) algorithms are the workhorse on which Deep Learning systems have been built upon. The standard approach of uniform sampling in SGD algorithm leads to high variance between the calculated gradient and the true gradient, consequently resulting in longer training times. Importance sampling methods are used for sampling mini-batches that reduce this variance. There exist provable importance sampling techniques for variance reduction but,they generally do not fare well in the case of Deep Learning models. Our work proposes sampling strategies that create diverse mini-batches which consequently leads to the reduction in the variance of the SGD algorithm. We pose the task of creation of such mini-batches as, maximization of a submodular objective function. The proposed submodular objective function samples minibatches that such that more uncertain and diverse set of samples are selected with high probability. Submodular functions can be optimized easily using the GREEDY[1] algorithm but, even the newer variants suffer from performance issues when the size of the dataset is large. We propose a new faster submodular optimization method method which is inspired from [2]. We prove theoretically that our sampling scheme reduces variance in the case of SGD algorithm. We also show that Determinantal point process(DPP) sampling can also be seen as a special case of our algorithm. We showcase the generalization of our method by testing it on several deep learning data sets like MNIST,FMNIST, CIFAR-10 datasets. We study the effect of learning rate, network architecture etc on our proposed method.We study how different features affect the performance of our algorithm. We also study the case of transfer learning with our algorithm used for selection of the dataset. In all the experiments, we compare our algorithm with Loss based sampling and Random sampling for comparison.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations