Towards Adversarial Robustness Via Compact Feature Representations

2021 
Deep Neural Networks (DNNs), while providing state-of-the-art performance in a wide variety of tasks, have been shown to be vulnerable to adversarial attacks. Recent studies have posited that this vulnerability arises because DNNs operate over a grossly overspecified input space with very sparse human supervision due to which they tend to learn spurious features that humans would ignore. These spurious features provide an attack vector for the adversary because perturbing these features would not alter the human’s decision but may alter the model’s prediction. In this paper we explore hypothesis that reducing the size of the model’s feature representation while maintaining its generalizability would discard spurious features while retaining perceptually relevant ones. We find that after the size of the feature representation has been reduced the models exhibit increased adversarial robustness, while suffering only a minimal loss in accuracy. In addition to being more robust, models with compact feature representations have the benefit of being more resource efficient.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    0
    Citations
    NaN
    KQI
    []