How to Label? Combining Experts’ Knowledge for German Text Classification

2020 
A supervised machine learning classifier can only be as good as the labeled training data. For this reason, there is a need for explicit human expert knowledge inside the workflow. Existing data collections often consist of classes different to the ones which are necessary for an individual application. Therefore, generating a new data set based on a predefined labeling guideline is mandatory. The aim of this work is to increase the quality of labeled data sets during their creation. We present a workflow for the labeling of unsorted data by a group of experts, including subsequent classifier training and evaluation. Even if combined with standard methods for feature extraction and classification, a performance improvement was achieved with the proposed labeling method. Furthermore, we offer access to our data set (German newspaper articles) including the labeling guideline as contribution to the research community.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []