Randomized Language Models via Perfect Hash Functions

David Talbot,Thorsten Brants

Randomized Language Models via Perfect Hash Functions

2008

David Talbot
Thorsten Brants

We propose a succinct randomized language model which employs a perfect hash function to encode fingerprints of n-grams and their associated probabilities, backoff weights, or other parameters. The scheme can represent any standard n-gram model and is easily combined with existing model reduction techniques such as entropy-pruning. We demonstrate the space-savings of the scheme via machine translation experiments within a distributed language modeling framework.

Keywords:

Hash function
Rolling hash
Machine learning
Double hashing
Artificial intelligence
Dynamic perfect hashing
Perfect hash function
Computer science
Hash tree
SWIFFT
Theoretical computer science
Hash filter
Algorithm

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations