Hitachi at SemEval-2020 Task 12: Offensive Language Identification with Noisy Labels using Statistical Sampling and Post-Processing.

Manikandan Ravikiran,Amin Ekant Muljibhai,Toshinori Miyoshi,Hiroaki Ozaki,Yuta Koreeda,Sakata Masayuki

Hitachi at SemEval-2020 Task 12: Offensive Language Identification with Noisy Labels using Statistical Sampling and Post-Processing.

2020

Manikandan Ravikiran
Amin Ekant Muljibhai
Toshinori Miyoshi
Hiroaki Ozaki
Yuta Koreeda
Sakata Masayuki

In this paper, we present our participation in SemEval-2020 Task-12 Subtask-A (English Language) which focuses on offensive language identification from noisy labels. To this end, we developed a hybrid system with the BERT classifier trained with tweets selected using Statistical Sampling Algorithm (SA) and Post-Processed (PP) using an offensive wordlist. Our developed system achieved 34 th position with Macro-averaged F1-score (Macro-F1) of 0.90913 over both offensive and non-offensive classes. We further show comprehensive results and error analysis to assist future research in offensive language identification with noisy labels.

Keywords:

Offensive
Hybrid system
english language
SemEval
Classifier (linguistics)
Sampling (statistics)
Natural language processing
Computer science
error analysis
Language identification
Artificial intelligence

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations