Hierarchical Active Learning with Overlapping Regions

2020 
Learning of classification models from real-world data often requires substantial human effort devoted to instance annotation. As this process can be very time-consuming and costly, finding effective ways to reduce the annotation cost becomes critical for building such models. To address this problem we explore a new type of human feedback - region-based feedback. Briefly, a region is defined as a hypercubic subspace of the input data space and represents a subpopulation of data instances; the region's label is a human assessment of the class proportion of the data subpopulation. By using learning from label proportions algorithms one can learn instance-based classifiers from such labeled regions. In general, the key challenge is that there can be infinite many regions one can define and query in a given data space. To minimize the number and complexity of region-based queries, we propose and develop a hierarchical active learning solution that aims at incrementally building a concise hierarchy of regions. Furthermore, to avoid building a possibly class-irrelevant region hierarchy, we further propose to grow multiple different hierarchies in parallel and expand those more informative hierarchies. Through experiments on numerous data sets, we demonstrate that methods using region-based feedback can learn very good classifiers from very few and simple queries, and hence are highly effective in reducing human annotation effort needed for building classification models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []