Distribution-based pruning of backoff language models

Jianfeng Gao,Kai-Fu Lee

Distribution-based pruning of backoff language models

2000

Jianfeng Gao
Kai-Fu Lee

We propose a distribution-based pruning of n-gram backoff language models. Instead of the conventional approach of pruning n-grams that are infrequent in training data, we prune n-grams that are likely to be infrequent in a new document. Our method is based on the n-gram distribution i.e. the probability that an n-gram occurs in a new document. Experimental results show that our method performed 7--9% (word perplexity reduction) better than conventional cutoff methods.

Keywords:

Training set
Machine learning
Computer science
Perplexity
Pruning
Language model
Artificial intelligence
Pattern recognition
Speech recognition
Cutoff
perplexity reduction

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations