Vulgarity Classification in Comments Using SVM and LSTM
2020
Multitudes of textual matter appear daily online. People, possessing the freedom of speech, very often tend to offend the sentiments of readers. Numerous accounts of online harassing, defaming, and bullying prevail in various social networking sites. Posting such content cannot be controlled but thanks to machine learning and deep learning such content can be identified and then removed. Jigsaw and Google have prepared tools to identify such kind of profanity appearing online, but they have not been successful to identify the type of toxicity a comment possesses. Kaggle hence put forth a challenge wherein besides identifying whether a comment is toxic, the comment can be classified into kinds of toxicity. In this challenge, categories like threats, insult, identity hate, and obscenity are taken into consideration. To complete this challenge, various machine learning and deep learning models are applied such as SVM and RNN-LSTM. Our main aim during this challenge is to study the results of using RNN-LSTM for toxic classification. The data is first vectorized using TF-IDF and bag of words. This paper also discusses the nature of the dataset. The results found to give a promising assurance in finding a solution to this problem.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
3
References
2
Citations
NaN
KQI