Gradient Masking of Label Smoothing in Adversarial Robustness

2021 
Deep neural networks (DNNs) have achieved impressive results in several image classification tasks. However, these architectures are unstable for adversarial examples (AEs) such as inputs crafted by a hardly perceptible perturbation with the intent of causing neural networks to make errors. AEs must be considered to prevent accidents in areas such as unmanned car driving using visual object detection in Internet of Things (IoT) networks. Gaussian noise with label smoothing or logit squeezing can be used to increase the robustness against AEs in the training of DNNs. However, from a model interpretability aspect, Gaussian noise with label smoothing does not increase the adversarial robustness of the model. To resolve this problem, we tackle the AE instead of measuring the accuracy of the model against AEs. Considering that a robust model shows a small curvature of the loss surface, we propose a metric to measure the strength of the AEs and the robustness of the model. Furthermore, we introduce a method to verify the existence of the obfuscated gradients of the model based on the black-box attack sanity check method. The proposed method enables us to identify a gradient masking problem wherein the model does not provide useful gradients and exploits false defenses. We evaluate our technique against representative adversarially trained models using the CIFAR10, CIFAR100, SVHN, and Restricted ImageNet datasets. Our results show that the performance of some false defense models decreases by up to 32% compared to the previous evaluation metrics. Moreover, our metric reveals that traditional metrics used to measure the robustness of the model may produce false results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    45
    References
    1
    Citations
    NaN
    KQI
    []