Chinese Data Extraction and Named Entity Recognition

Tingwei Yang,Daguang Jiang,Shenghui Shi,Siyan Zhan,Lin Zhuo,Yukang Yin,Zheng Liang

Chinese Data Extraction and Named Entity Recognition

2020

Tingwei Yang
Daguang Jiang
Shenghui Shi
Siyan Zhan
Lin Zhuo
Yukang Yin
Zheng Liang

Extracting effective information from a large amount of Chinese text data is an important task for data analysis in the era of big data. The key to extracting information is whether it can quickly identify named entities in Chinese text. This paper analyzes the structure of text data about news text and specialized medical text, propose IDCNN-BiLSTM-CRF (Iterated Dilated Convolutions-Bidirectional Long Short Memory Network-Condition Random Field) model. In this paper, medical text data is processed by analyzing the structure of the public news dataset. This paper analyzes the structure of public news datasets to process special medical text data, and uses public news datasets and special medical text datasets to compare and analyze the model proposed in this paper and the BiLSTM-CRF model to verify the recognition result.

Keywords:

Computer science
Iterated function
Big data
Data mining
Data extraction
Information retrieval
Random field
Named-entity recognition

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations