CRF Based Research on a Unified Approach to Word Segmentation and POS Tagging for Pre-Qin Chinese
2010
This paper explores the cross field between NLP and ancient Chinese,particularly the pre-Qin documents.The text of "Zuo Zhuan" is firstly analyzed after manual segmentationand POS tagging.Then the Conditional Random Fields model(CRF) is adopted for the word segmentation(WS),POS tagging(PT) and a unified process of WS and PT,respectively.The precision and recall of the unified approach are much higher than the independent WS and PT in the open test,with a F-score of 94.60% in WS and 89.65% in PT.This method is suitable for the study of ancient Chinese vocabulary and corpus construction,and can be applied to compensatethe manual tagging.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
10
Citations
NaN
KQI