Research on Code Plagiarism Detection Model Based on Random Forest and Gradient Boosting Decision Tree

2019 
This paper studies the Online Judge System for assignments such as programming. Sometimes there are plagiarismsin codes submitted by students[1]. In addition to calculating the similarity degree between the codes, we also extract other features to determine whether there isplagiarismsuspicion of a submitted code or not. By using combination of Random Forest and Gradient Boosting Decision Tree, we also can getitssuspicion level. The model first calculates the similarity degree between the newly submitted code and all submitted codes, and determines plagiarism suspect. For some codes that are difficult to confirm whetherisplagiarismor not, we extract the programming style similarity degree, and the student's submission behavior pattern (such as similar target concentration degree) and other features, to create decision trees such as Random Forestand Gradient Boosting Decision Trees, which can help determine the level of plagiarism suspect. If the level is medium, the teacher will mark the code as plagiarized or not. Finally, the learning model is incrementally trained to improve the accuracy of the model and the classification results. Experiment results show that the accuracy rate can reach 95.9%. As a result, the model can prevent students from plagiarizing while minimizing the workload of the teacher.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []