Semantic-Aware Deep Neural Attention Network for Machine Translation Detection

2021 
Web crawling is an important way to collect a massive training corpus for building a high-quality machine translation system. However, a large amount of data collected comes from machine-translated texts rather than native speakers or professional translators, severely reducing the benefit of data scale. Traditional machine translation detection methods generally require human-crafted feature engineering and are difficult to distinguish the fine-grained semantic difference between real text and pseudo text from a modern neural machine translation system. To address this problem, we propose two semantic-aware models based on the deep neural network to automatically learn semantic features of text for monolingual scenarios and bilingual scenarios, respectively. Specifically, our models incorporate the global semantic from BERT and the local semantic from convolutional neural network together for monolingual detection and further explores the semantic consistency relationship for bilingual detection. The experimental results on the Chinese-English machine translation detection task show that our models achieve 83.12% \(F_{1}\) in the monolingual detection and 85.53% \(F_{1}\) in the bilingual detection respectively, which is better than the strong BERT baselines by 2.2–3.2%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    0
    Citations
    NaN
    KQI
    []