How to Label? Combining Experts’ Knowledge for German Text Classification
2020
A supervised machine learning classifier can only be as good as the labeled training data. For this reason, there is a need for explicit human expert knowledge inside the workflow. Existing data collections often consist of classes different to the ones which are necessary for an individual application. Therefore, generating a new data set based on a predefined labeling guideline is mandatory. The aim of this work is to increase the quality of labeled data sets during their creation. We present a workflow for the labeling of unsorted data by a group of experts, including subsequent classifier training and evaluation. Even if combined with standard methods for feature extraction and classification, a performance improvement was achieved with the proposed labeling method. Furthermore, we offer access to our data set (German newspaper articles) including the labeling guideline as contribution to the research community.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
1
Citations
NaN
KQI