Recaptured screen image identification based on vision transformer

2023 
Due to the copyright issues often involved in the recapture of LCD screen content, recaptured screen image identification has received lots of concerns in image source forensics. This paper analyzes the characteristics of convolutional neural network (CNN) and vision transformer (ViT) in extracting features and proposes a cascaded network structure that combines local-feature and global-feature extraction modules to detect the recaptured screen image from original images with or without demoiréing operation. We first extract the local features of the input images with five convolutional layers and feed the local features into the ViT to enhance the local perception capability of the ViT module, and further extract the global features of the input images. Through thorough experiments, our method achieves a detection accuracy rate of 0.9691 in our generated dataset and 0.9940 in the existing mixture dataset, both showing the best performance among the compared methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []