BIBC: A Chinese Named Entity Recognition Model for Diabetes Research

2021 
In the medical field, extracting medical entities from text by Named Entity Recognition (NER) has become one of the research hotspots. This thesis takes the chapter-level diabetes literature as the research object and uses a deep learning method to extract medical entities in the literature. Based on the deep and bidirectional transformer network structure, the pre-training language model BERT model can solve the problem of polysemous word representation, and supplement the features by large-scale unlabeled data, combined with BiLSTM-CRF model extracts of the long-distance features of sentences. On this basis, in view of the problem that the model cannot focus on the local information of the sentence, resulting in insufficient feature extraction, and considering the characteristics of Chinese data mainly in words, this thesis proposes a Named Entity Recognition method based on BIBC. This method combines Iterated Dilated CNN to enable the model to take into account global and local features at the same time, and uses the BERT-WWM model based on whole word masking to further extract semantic information from Chinese data. In the experiment of diabetic entity recognition in Ruijin Hospital, the accuracy rate, recall rate, and F1 score are improved to 79.58%, 80.21%, and 79.89%, which are better than the evaluation indexes of existing studies. It indicates that the method can extract the semantic information of diabetic text more accurately and obtain good entity recognition results, which can meet the requirements of practical applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    0
    Citations
    NaN
    KQI
    []