dictNN: A Dictionary-Enhanced CNN Approach for Classifying Hate Speech on Twitter.

Maximilian Kupi,Michael Bodnar,Nikolas Schmidt,Carlos Eduardo Posada

dictNN: A Dictionary-Enhanced CNN Approach for Classifying Hate Speech on Twitter.

2021

Maximilian Kupi
Michael Bodnar
Nikolas Schmidt
Carlos Eduardo Posada

Hate speech on social media is a growing concern, and automated methods have so far been sub-par at reliably detecting it. A major challenge lies in the potentially evasive nature of hate speech due to the ambiguity and fast evolution of natural language. To tackle this, we introduce a vectorisation based on a crowd-sourced and continuously updated dictionary of hate words and propose fusing this approach with standard word embedding in order to improve the classification performance of a CNN model. To train and test our model we use a merge of two established datasets (110,748 tweets in total). By adding the dictionary-enhanced input, we are able to increase the CNN model's predictive power and increase the F1 macro score by seven percentage points.

Keywords:

Ambiguity
Speech recognition
Social media
merge
Macro
Word embedding
Computer science
Predictive power
Natural language

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations