Stochastic activation pruning for robust adversarial defense

2018 
Following recent work, neural networks are widely-known to be vulnerable to adversarial examples. Carefully chosen perturbations to real images, while imperceptible to humans, induce misclassification, threatening the reliability of deep learning in the wild. To guard against adversarial examples, we take inspiration from game theory and cast the problem as a minimax zero-sum game between the adversary and the model. In general, in such settings, optimal policies are stochastic. We propose stochastic activation pruning (SAP), an algorithm that prunes a random subset of activations, scaling up the survivors to compensate. The algorithm preferentially keeps activations with larger magnitudes. SAP can be applied to pre-trained neural networks, even adversarially trained models, without fine-tuning, providing robustness against adversarial examples. Experiments demonstrate that in the adversarial setting, SAP confers robustness, increasing accuracy and preserving calibration.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    228
    Citations
    NaN
    KQI
    []