Quality Control for Hierarchical Classification with Incomplete Annotations

2021 
Hierarchical classification requires annotations with hierarchical class structures. Although crowdsourcing services are inexpensive ways to collect annotations for hierarchical classification, the results are often incomplete because of the workers’ limited abilities that unable to label all classes, and crowdsourcing platforms also allow suspensions during the labeling flow. Unfortunately, existing quality control approaches for refining low-quality annotations discard those incomplete annotations, and this limits the quality improvement of the results. We propose a quality control method for hierarchical classification that leverages incomplete annotations and the similarity between classes in the hierarchy for estimating the true leaf classes. Our method probabilistically models the labeling process and estimates the true leaf classes by considering the class-likelihood of samples and workers’ class-dependent expertise. Our method embeds the class hierarchy into a latent space and represents samples as well as the worker’s prototypical samples for classes (prototypes) as vectors in this space. The similarities between the vectors in the latent space are used to estimate the true leaf classes. The experimental results on both real-world and synthetic datasets demonstrate the effectiveness of our method and its superiority over the baseline methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []