Deepfake Video Detection Using Audio-Visual Consistency.

2020 
Benefit from significant advances in deep learning, widespread Deepfake videos with the convincing manipulations, have posted a serious of threats to public security, thus the identification of fake videos has become increasingly active in current researches. However, most present Deepfake detection methods concentrate on exposing facial defects through direct facial feature analysis while merely considering the synergies with authentic behavior information outside the facial regions. Meanwhile, schemes based on meticulous-designed neural networks are rarely efficient to provide subjective interpretations of the final identification evidences. Therefore, to further enrich the diversity of detection method and increase the interpretability of detection evidences, this paper proposes a self-referential method to exploit audio-visual consistency by introducing synchronous audio recordings as reference. In preprocess phase, we propose an audio-visual matching strategy based on phonemes to segment videos, and control experiments have proved that strategy outperforms common equal-length partition. To deal with such video segments, an audio-visual coupling model (AVCM) is employed for audio-visual feature representations, then similarity metrics are measured for mouth frames and related speech segments. Actually, synchronized pairs mean the high scores of similarity and asynchronous pairs opposite. The evaluations on DeepfakeVidTIMIT indicate that our method has achieved competitive results compared with current main methods, especially in high quality datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []