Phrase-based statistical language modeling from bilingual parallel corpus

Jun Mao,Gang Cheng,Yanxiang He

Phrase-based statistical language modeling from bilingual parallel corpus

2007

Jun Mao
Gang Cheng
Yanxiang He

Phrase-based models and class-based models are both variants of classical n-gram models. In this paper, we propose an approach by merging phrase-based models and class-based models together. In the phrase-based part, we use bilingual parallel corpus to extract phrases with a method deriving from phrase-based translation models. Then we partition these phrases into phrase classes by minimizing the loss of the average mutual information with the aid of a count matrix. Our experimental results suggest that phrase-based models can capture more key information than word-based models and class-based models can capture the relationship among similar words or phrases and thus solve the problem of data sparseness in some sense.

Keywords:

Phrase search
Noun phrase
Mutual information
Natural language processing
Merge (version control)
Phrase
Matrix (mathematics)
Language model
Artificial intelligence
Computer science
chinese word
Speech recognition

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations