Solving Combinatorial Ambiguity in Chinese Word Segmentation Using Contextual Information

2001 
Combinatorial ambiguity is a vital issue in Chinese word segmentation.We regard it as an equivalence of the problem of word sense disambiguation(WSD)in language computing.In sight of the vector space model commonly used in WSD and based on detailed observations on 20 typical combinatorial ambiguities,this paper at first presents the strategy of treating these ambiguities separately according to their distribution,then determines by experiments the key factors regarding feature matrix(the size of the context window,the sensitivity of locations in the window as well as weighting of feature words),and lastly makes use of semantic codes of words so as to reduce the dimension of the feature matrix.Preliminary results show that the proposed scheme is satisfactory in performance and may serve as a general solution for processing combinatorial ambiguities.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    4
    Citations
    NaN
    KQI
    []