Stochastic activation pruning for robust adversarial defense

Guneet S. Dhillon,Kamyar Azizzadenesheli,Jeremy Bernstein,Jean Kossaifi,Aran Khanna,Zachary C. Lipton,Animashree Anandkumar

Stochastic activation pruning for robust adversarial defense

2018

Following recent work, neural networks are widely-known to be vulnerable to adversarial examples. Carefully chosen perturbations to real images, while imperceptible to humans, induce misclassification, threatening the reliability of deep learning in the wild. To guard against adversarial examples, we take inspiration from game theory and cast the problem as a minimax zero-sum game between the adversary and the model. In general, in such settings, optimal policies are stochastic. We propose stochastic activation pruning (SAP), an algorithm that prunes a random subset of activations, scaling up the survivors to compensate. The algorithm preferentially keeps activations with larger magnitudes. SAP can be applied to pre-trained neural networks, even adversarially trained models, without fine-tuning, providing robustness against adversarial examples. Experiments demonstrate that in the adversarial setting, SAP confers robustness, increasing accuracy and preserving calibration.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

228

Citations