CRF Based Research on a Unified Approach to Word Segmentation and POS Tagging for Pre-Qin Chinese

2010 
This paper explores the cross field between NLP and ancient Chinese,particularly the pre-Qin documents.The text of "Zuo Zhuan" is firstly analyzed after manual segmentationand POS tagging.Then the Conditional Random Fields model(CRF) is adopted for the word segmentation(WS),POS tagging(PT) and a unified process of WS and PT,respectively.The precision and recall of the unified approach are much higher than the independent WS and PT in the open test,with a F-score of 94.60% in WS and 89.65% in PT.This method is suitable for the study of ancient Chinese vocabulary and corpus construction,and can be applied to compensatethe manual tagging.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    10
    Citations
    NaN
    KQI
    []