Towards an Efficient and Robust Adversarial Attack Against Neural Text Classifier

2022 
Adversarial attack is a serious threat to neural network-based natural language processing applications. Adversarial attack uses tiny well-crafted perturbations to mislead neural networks. While existing adversarial text attacks can achieve good attack effects, they still do not guarantee efficiency and robustness. The adversarial text attacks are more efficient if they use less perturbation to achieve a higher attack success rate. The attacks are more robust if they can achieve a higher success rate when defense strategies are applied. To improve the efficiency and robustness of the adversarial attack, we propose SMAL: Saliency Map Attack with Levenshtein-similarity. The proposed attack consists of two parts: (1) The saliency map measures the perturbation priority of each word. It considers not only the influence of each word on the classification result but also how to maintain the misled classification result to improve the robustness of the attack. (2) Levenshtein-similarity network embeds words into edit distance space. When perturbing sentences, some words are replaced by substitutions with less edit distance. This can reduce the amount of modification, which improves the efficiency of the attack. Since the words are embedded in edit distance space rather than semantic space, the semantic-based defense is not effective for this attack, which improves the robustness. The experiments show that SMAL achieves a higher attack success rate with fewer perturbations. Also, the proposed attack is better when attacking a classifier defended by adversarial training.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []