Zo Grof !: A Comprehensive Corpus for Offensive and Abusive Language in Dutch

Ruitenbeek, Ward,Zwart, Victor,Van Der Noord, Robin,Gnezdilov, Zhenja,Tommaso Caselli

Zo Grof !: A Comprehensive Corpus for Offensive and Abusive Language in Dutch

2022

This paper presents a comprehensive corpus for the study of socially unacceptable language in Dutch. The corpus extends and revise an existing resource with more data and introduces a new annotation dimension for offensive language, making it a unique resource in the Dutch language panorama. Each language phenomenon (abusive and offensive language) in the corpus has been annotated with a multi-layer annotation scheme modelling the explicitness and the target(s) of the message. We have conducted a new set of experiments with different classification algorithms on all annotation dimensions. Monolingual Pre-Trained Language Models prove as the best systems, obtaining a macro-average F1 of 0.828 for binary classification of offensive language, and 0.579 for the targets of offensive messages. Furthermore, the best system obtains a macro-average F1 of 0.667 for distinguishing between abusive and offensive messages.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations