A Novel Misclassification Attack Against Black Box Neural Network Classifiers

2018 
It is generally believed that neural network classifiers are vulnerable to misclassification attacks. An adversary generates adversarial samples by adding small perturbations to the original samples. These adversarial samples will mislead classifiers, although they are almost the same as the original samples from the perspective of human observation. However, the existing misclassification attacks need either the details of classifier or a fake classifier to craft the adversarial samples. One may think a black box classifier is robust enough against misclassification attacks. We demonstrate that black box classifier is still vulnerable to our proposed misclassification attack. We conceal the details of classifier. The only thing an adversary can do is to query samples' classification results. We proposed a particle swarm optimization based misclassification attack. Using this attack an adversary can make black box classifiers yield erroneous results. The experiments show that LeNet and GoogLeNet are vulnerable to our proposed attack. The misclassification rate on MNIST and ImageNet ILSVRC 2012 dataset are 99.1% and 98.4%. Finally, we give some defense strategies against misclassification attacks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    1
    Citations
    NaN
    KQI
    []