Efficient Version Space Algorithms for "Human-in-the-Loop" Model Development

2020 
When active learning (AL) is applied to help the user develop a model on a large dataset through interactively presenting data instances for labeling, existing AL techniques can suffer from two main drawbacks: first, they may require hundreds of labeled data instances in order to reach high accuracy; second, retrieving the next instance to label can be time consuming, making it incompatible with the interactive nature of the human exploration process. To address these issues, we introduce a novel version space based AL algorithm for kernel classifiers, which not only has strong theoretical guarantees on performance, but also allows for an efficient implementation in time and space. In addition, by leveraging additional insights obtained in the user labeling process, we are able to factorize the version space to perform active learning in a set of subspaces, which further reduces the user labeling effort. Evaluation results show that our algorithms significantly outperform state-of-theart version space algorithms, as well as a recent factorization-aware algorithm, for model development over large data sets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []