Adversarial sample detection framework based on autoencoder

2020 
Despite the great success of deep neural networks (DNN) in many tasks, they are often fooled by examples of confrontation created by adding small and purposeful distortions to natural examples. Previous research has mainly focused on improving DNN models, but either results are limited or expensive calculations are required. This paper studies an integrated feature noise reduction method: by using Gaussian filtering, mean filtering, and median filtering, the automatic encoder integrates a neural network to prevent the generation of adaptive adversarial samples. By comparing the reconstruction error of the autoencoder to detect adversarial samples, these simple strategies are not only low-cost, but also complementary to other defensive measures. In this paper, a new autoencoder is created as a modifier to make the two combine to form an adversarial The joint detection framework of the sample to achieve a high detection rate for the latest attacks. For several methods with high attack success rates, FGSM, BIM, PGD and CW attacks. For larger disturbances, the black box attacks against the MNIST data set the undetected rate of CW non-target attacks is 23%, and the detection rate of other attacks is 100%. The undetected rate of CIFAR-10 black box attacks except CW non-target attacks is 25%, and the undetected rate of other attacks is below 5%. In the case of black box attacks with small disturbances, the classification accuracy of the MNIST protected network has reached more than 90%, and the classification accuracy of the CIFAR-10 protected network has reached more than 80% in addition to the CW attack classification. The accuracy rate has also reached a high level.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []