Combating Word-level Adversarial Text with Robust Adversarial Training

2021 
NLP models perform well on many tasks, but they are also easy to be fooled by adversarial examples. A small perturbation can change the output of the deep neural network model. This kind of perturbation is hard to be perceived by humans, especially adversarial examples generated by word-level adversarial attack. Character-level adversarial attack can be defended by grammar detection and word recognition. The existing word-level textual adversarial attacks are based on synonym replacement, so adversarial texts usually have correct grammar and semantics. The defense of word-level adversarial attack is more challenging. In this paper, we propose a framework which is called Robust Adversarial Training (RAT) to defend against word-level adversarial attacks. RAT enhances the model by combining adversarial training and data perturbation during training. Our experiments on two datasets show that the model based on our framework can effectively defend against word-level adversarial attacks. Compared with the existing defense methods, the model trained under RAT has a higher defense success rate on 1000 adversarial examples. In addition, the accuracy of our model on the standard testing set is also better than the existing defense methods, and the accuracy is very close to or even higher than that of the standard model.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []