Semi-supervised entity recognition of Chinese government document

2019 
There is a large amount of entity information in government documents. Identifying the entity information in government documents is the core foundation of intelligent document processing tasks, such as word segmentation, semantic analysis and knowledge graph construction. To recognize entity, traditional Machine Learning algorithm has the advantage of relatively small tagging corpus requirement. However, this feature also means that this algorithm can hardly capture the implicit semantic information in sentences, which leads to the low accuracy of document entity recognition. Also, this method requires tremendous manual work of feature designing. In contrast, Deep Learning algorithm needs a large tagging corpus. But it gives the algorithm ability to automatically acquire semantic feature information between context. So, the accuracy performance of entity recognition is greatly improved. Combining respective advantages of these above methods, this paper proposes a semi-supervised Deep Learning algorithm framework, which first implement the Conditional Random Field (CRF) and pseudo-labeling to expand the corpus, and then utilize the Dilated Convolution Neural Network (CNN) with Bi-directional Long Short-Term Memory (BiLSTM) plus CRF for extracting entities in official documents. The experimental results show that, compared with other methods, the accuracy, recall rate and F1 value of entity recognition are improved by 5.02%, 5.85% and 5.44% respectively. The proposed method can effectively extract entity information in a document.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    1
    Citations
    NaN
    KQI
    []