Disease name recognition based on syntactic and semantic features

2018 
Biomedical entity recognition (such as genes, proteins, chemicals, diseases, etc.) is the foundation of biomedical text mining, which plays a significant role in extracting biomedical entity relations and constructing biomedical knowledge bases. To deal with existing issues of the current disease name recognition systems, this paper proposes a series of new syntactic and semantic features to improve disease name recognition. The syntactic features include chunk and dependency information, while the semantic features include the disease abbreviation form, its dictionary entry form, and hyponymy relationships between disease concepts. Experiments over the NCBI disease corpus show the CRF model, combined with these syntactic and semantic features, can significantly improve the state-of-the-art performance of disease entity recognition, achieving an F1 score of 85.3%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []