Hierarchical and Pairwise Document Embedding for Plagiarism Detection

Ruitong Zhang,Lianzhong Liu,Jiaofu Zhang,Zihang Huang,Caiwei Yang,Liangxuan Zhao,Tongge Xu

Hierarchical and Pairwise Document Embedding for Plagiarism Detection

2020

The rapid development of the Internet, especially the application of search engines and machine translation, makes it easier to copy texts. Most existing text plagiarism detection methods are not capable of dealing with the increasing number of plagiarism sources and the increasingly ambiguous plagiarized texts. In this paper, we pay attention to the task of large-scale text deduplication, and propose a multi-level distributed text computing model, which improves the checking speed through multi-level latent semantic analysis, and combines BERT to judge plagiarized text more accurately. In order to further verify the model, we also combined the latest fuzzy plagiarism technology to construct a three-level data set. The experimental results show that our model performs well when plagiarism data increases and plagiarism ambiguity increases.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations