Chinese syllable-to-character conversion with recurrent neural network based supervised sequence labelling

2015 
Chinese Syllable-to-Character (S2C) conversion is the important component for Input Methods, and the key problem in Chinese S2C conversion is the serious phenomenon in Chinese language. In order to disambiguate homophones to improve Chinese S2C conversion, in this paper, Chinese S2C conversion is treated as a sequence labelling task, and the recurrent neural network (RNN) based on supervise sequence labelling is introduced to achieve the direct conversion from syllable sequences to word sequences. Through the direct conversion with the proposed RNN, the cascade error in multi-pass approaches can be eliminated effectively. Experimental results indicate that, in second pass decoding, the re-ranking with RNN language model has better performance than N-gram language model in both perplexity and S2C conversion accuracy. Moreover, the direct S2C conversion with RNN can improve the accuracy from 93.77% (RNN language model) to 94.17%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    4
    Citations
    NaN
    KQI
    []