Simple and Easy: Transfer Learning-Based Attacks to Text CAPTCHA

2020 
CAPTCHA, or Completely Automated Public Turing Tests to Tell Computers and Humans Apart, is a common mechanism used to protect commercial accounts from malicious computer bots, and the most widely used scheme is text-based CAPTCHA. In recent years, newly emerged deep learning techniques have achieved high accuracy and speed in attacking text-based CAPTCHAs. However, most of the existing attacks have various disadvantages, the attack process made high complexity or manually collecting and labeling a large number of samples to train a deep learning recognition model is time-consuming and expensive. In this paper, we propose a transfer learning-based approach that greatly reduces the attack complexity and the cost of labeling samples, specifically, by pre-training the model with randomly generated samples and fine-tuning the pre-trained model with a small number of real-world samples. To evaluate our attack, we tested 25 online CAPTCHAs achieving success rates ranging from 36.3% to 96.9%. To further explore the effect of the training sample characteristics on the attack accuracy, we elaborately imitate some samples and apply a generative adversarial network to refine the samples, sequentially we use these two kinds of generated samples to pre-train the models, respectively. The experimental results demonstrate that the similarity between randomly generated samples and elaborately imitated samples has a negligible impact on the attack accuracy. Instead, transfer learning is the key factor; it reduces the cost of data preparation while preserving the model's attack accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    8
    Citations
    NaN
    KQI
    []