An Effective and Efficient Method for Word-Level Textual Adversarial Attack

2021 
Adversarial examples are used to reveal the vulnerability of deep neural networks (DNNs) and improve their robustness. The word-level attack is a well-studied class of textual adversarial attack methods. However, existing word-level attacks have unstable success rates in different application scenarios. And the attacks under black-box setting suffer from low efficiency because they need to query the target DNN model with a great quantity. In this paper, we present SynonymPSO, a word-level attack method for generating adversarial texts. Specifically, we use a variety of means to find and filter synonyms to construct a comprehensive candidate pool. Besides, we design a kind of modification record strategy to improve the efficiency of the particle swarm optimization algorithm. Compared with prior works, SynonymPSO has the following features: (1) effective - it outperforms the state-of-art attacks in terms of attack success rate on most occasions; (2) efficient - it generates adversarial examples with fewer queries and less time. We evaluate SynonymPSO on five datasets that belong to different text classification tasks, including sentiment analysis, natural language inference and spam detection. The experimental results demonstrate its effectiveness and efficiency. For instance, when attacking BiLSTM over Enron dataset, the attack success rate of our method is 20% higher than the baseline while the query number is reduced by 94%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []