A Study of The Value of Parameter N in n gram Statistical Model in Chinese Language

1998 
Abstract As a major statistical model,n gram has been applied extensively in the process of language processing (such as POS tagging,language modeling of speech recognition,character recognition,etc.).However,there is no definitive conclusion what N value will be optimal for Chinese language processing until now.This paper introduces a kind of estimation for the selection of parameter N in n gram model in Chinese language. Three factors has been analyzed for comparing different N value. These are the approximate expression for Chinese grammatical structure,reconstruction of new words,and the performance for the transcription of Chinese Pinyin sequence to text. Finally, a conclusion was obtained that 4 is a better selection of parameter N value for n gram model based on words in Chinese language. It will be helpful for the development of Chinese statistical language model and language processing.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []