language-icon Old Web
English
Sign In

Probability Guided Maxout

2021 
In this paper, we propose an original CNN training strategy that brings together ideas from both dropout-like regularization methods and solutions that learn discriminative features. We propose a dropping criterion that, differently from dropout and its variants, is deterministic rather than random. It grounds on the empirical evidence that feature descriptors with larger L2-norm and highly-active nodes are strongly correlated to confident class predictions. Thus, our criterion guides towards dropping a percentage of the most active nodes of the descriptors, proportionally to the estimated class probability. We simultaneously train a per-sample scaling factor to balance the expected output across training and inference. This further allows us to keep high the descriptor's L2-norm, which we show enforces confident predictions. The combination of these two strategies resulted in our “Probability Guided Maxout” solution that acts as a training regularizer. We prove the above behaviors by reporting extensive image classification results on the CIFAR10, CIFAR100, and Caltech256 datasets. Code is available at https://github.com/clferrari/probability-guided-maxout.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    0
    Citations
    NaN
    KQI
    []