Exploring linguistic features for extremist texts detection (on the material of Russian-speaking illegal texts)

Dmitry Devyatkin,Ivan Smirnov,Ananyeva Margarita,Kobozeva Maria,Chepovskiy Andrey,Solovyev Fyodor

Exploring linguistic features for extremist texts detection (on the material of Russian-speaking illegal texts)

2017

Dmitry Devyatkin
Ivan Smirnov
Ananyeva Margarita
Kobozeva Maria
Chepovskiy Andrey
Solovyev Fyodor

In this paper we present results of a research on automatic extremist text detection. For this purpose an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution of differentiating features (lexical, semantic and psycholinguistic) to classification quality. The results of experiments show that psycholinguistic and semantic features are promising for extremist text detection.

Keywords:

Semantics
Data mining
Logistic regression
Naive Bayes classifier
Computer science
Support vector machine
Gradient boosting
Pragmatics
Natural language processing
Feature extraction
Random forest
Artificial intelligence
Pattern recognition
classification methods
text detection
Linguistics

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations