Detecting automatically generated sentences with grammatical structure similarity

2018 
Automatically generated papers have been used to manipulate bibliography indexes on numerous occasions. This paper is interested in different means to generate texts such as recurrent neural network, Markov model, or probabilistic context free grammar, and if it is possible to detect them using a current approach. Then, probabilistic context free grammar (PCFG) is focused on as the one most used. However, even though there have been multiple approaches to detect such types of paper, they are all working at the document level and are unable to detect a small amount of generated text inside a larger body of genuinely written text. Thus, we present the grammatical structure similarity measurement to detect sentences or short fragments of automatically generated text from known PCFG generators. The proposed approach is tested against a pattern checker and various common machine learning methods. Additionally, the ability to detect a modified PCFG generator is also tested.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    3
    Citations
    NaN
    KQI
    []