Crowd labeling latent Dirichlet allocation

2017 
Large, unlabeled datasets are abundant nowadays, but getting labels for those datasets can be expensive and time-consuming. Crowd labeling is a crowdsourcing approach for gathering such labels from workers whose suggestions are not always accurate. While a variety of algorithms exist for this purpose, we present crowd labeling latent Dirichlet allocation (CL-LDA), a generalization of latent Dirichlet allocation that can solve a more general set of crowd labeling problems. We show that it performs as well as other methods and at times better on a variety of simulated and actual datasets while treating each label as compositional rather than indicating a discrete class. In addition, prior knowledge of workers’ abilities can be incorporated into the model through a structured Bayesian framework. We then apply CL-LDA to the EEG independent component labeling dataset, using its generalizations to further explore the utility of the algorithm. We discuss prospects for creating classifiers from the generated labels.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    22
    Citations
    NaN
    KQI
    []