Chinese address word segmentation and annotation method

2015 
The present invention relates to a Chinese address word segmentation and annotation method. The method comprises: step 11, selecting address data by means of manual word segmentation and annotation as training data; step 12, substituting specified a single Arabic numerical character or English character for a present single Arabic numerical character or English character and a plurality of continuous Arabic numerical characters or English characters; step 13, converting the training data into a data format desired by the CRF++ tool; step 14, defining a feature profile; step 15, respectively establishing a word segmentation model and an annotation model by using the CRF++ tool; step 16, substituting the specified single Arabic numerical character or English character for the single Arabic numerical character or English character and the plurality of Arabic numerical characters or English characters present in the address; step 17, performing word segmentation and annotation by using the CRF++ tool; and step 18, recovering the Arabic numerical character or English character before the substitution. The Chinese address word segmentation and annotation method according to the present invention achieves high accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []