Research on Named Entity Recognition in Chinese EMR Based on Semi-Supervised Learning with Dual Selected Strategy

2020 
With the construction of the electronic medical record system, medical record data begins to accumulate, and how to extract essential information from these resources has become a concern. And named entity recognition(NER) is the first step. With the help of doctors, we built a small Chinese electronic medical record annotation corpus. But the NER supervision method requires a large amount of manually labeled corpus. So to reduce the cost of it and make better use of the unlabeled corpus, this paper proposes a semi-supervised Chinese electronic medical record NER model based on ALBERT-BiLSTM-CRF which named CEMRNER. The model uses a Bidirectional Long Short Term Memory network (BiLSTM) and a Conditional Random Field model (CRF) to train the data and introduces the pre-training language model ALBERT to solve the problem of Chinese representation. At the same time, we propose a dual selected strategy to select the high confidence samples and expand the training set. The dual strategy can ensure the accuracy i automatically labeled data, and reduce the error iteration in semi-supervised learning. The experiment and analysis show that compared with other models, this method is more accurate and comprehensive. The precision, recall rate, and F1Score are 85.45%, 87.81%, and 86.61%, respectively. The paper proves that using a semi-supervised method and pre-training ALBERT can improve the accuracy of recognition under the condition of less labeled data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    0
    Citations
    NaN
    KQI
    []