Ensemble Method for Sexual Predators Identification in Online Chats

2020 
Cyber grooming is a compelling problem worldwide nowadays and many reports strongly suggested that it becomes very urgent to tackle this problem to protect the children from sexual exploitation. In this study, we propose an effective method for sexual predator identification in online chats based on two-stage classification. The purpose of the first stage is to distinguish predatory conversations from the normal ones while the second stage aims to tell apart between the predator user and the victim within a single predatory conversation. Finally, some unique predators are derived from the second stage result. We investigate several machine learning classifiers including Naive Bayes, Support Vector Machine, Neural Network, Logistic Regression, Random Forest, K-Nearest Neighbors, and Decision Tree with Bag of Words features using several different term weighting methods for this task. We also proposed two ensemble techniques to improve the classification task. The experiment results on PAN12 dataset show that our best method using soft voting based ensemble for first stage and Naive Bayes based method for the second stage obtained an F 0.5 -score of 0.9348, which would place as number one in the PAN12 competition ranking.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    3
    Citations
    NaN
    KQI
    []