Attacking Image Captioning Towards Accuracy-Preserving Target Words Removal

2020 
In this paper, we investigate the fragility of deep image captioning models against adversarial attacks. Different from existing works that generate common words and concepts, we focus on the adversarial attacks towards controllable image captioning, i.e., removing target words from captions by imposing adversarial noises to images while maintaining the captioning accuracy for the remaining visual content. We name this new task as Masked Image Captioning (MIC), which is expected to be training and labeling free for end-to-end captioning models. Meanwhile, we propose a novel adversarial learning approach for this new task, termed Show, Mask, and Tell (SMT), which crafts adversarial examples to mask the target concepts via minimizing an objective loss while training the noise generator. Concretely, three novel designs are introduced in this loss, i.e., word removal regularization, captioning accuracy regularization, and noise filtering regularization. For quantitative validation, we propose a benchmark dataset for MIC based on the MS COCO dataset, together with a new evaluation metric called Attack Quality. Experimental results show that the proposed approach achieves successful attacks by removing 93.8% and 91.9% target words while maintaining 97.3% and 97.4% accuracies on two cutting-edge captioning models, respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    30
    References
    3
    Citations
    NaN
    KQI
    []