Automatic Identification of Named Entities in Literatures Using GenomeNet

2005 
Information extraction (IE) is essential owing to increase of protein related literature contains active site and interaction information. It is important to identify correctly named entities for the accurate information extraction. Named entities in molecular biology fields is described variously and cannot be identified easily by terminology dictionary. This paper proposes a new method for automatic identification of named entities in literatures using GenomeNet. GenomeNet has text search system for a lot of protein related databases which are updated every day. Our method utilizes the search result of GenomeNet as knowledge base. The search result contains the clues for identifying named entities. Focusing on the clues in the search results, three parameters are defined. And we can identify the named entities to figure out if the three parameters meets the identification condition. We applied proposed method for some protein related literatures to compare with other method. The evaluated value is improved substantially and efffectness of proposed method is clear.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    0
    Citations
    NaN
    KQI
    []