A Framework of Data Augmentation While Active Learning for Chinese Named Entity Recognition.

2021 
Named entity recognition (NER) is a basic task to construct knowledge graph. The training performance is limited with few labelled data. One solution is active learning, which can achieve ideal results by multi-round sampling strategy to augment unlabelled data. However, there is very few labelled data in the early rounds, which leads to slow improvement on training performance. We thus propose a framework of data augmentation while active learning. To validate our claims, we focus on Chinese NER task and carry out extensive experiments on two public datasets. Experimental results show that our framework is effective for a series of classical query strategy. We can achieve 99% of the best deep model trained on full data using only 22% of the data on Resume, 63% labelled data is reduced as compared to pure active learning (PAL).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []