A statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability P ( w 1 , … , w m ) {displaystyle P(w_{1},ldots ,w_{m})} to the whole sequence. A statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability P ( w 1 , … , w m ) {displaystyle P(w_{1},ldots ,w_{m})} to the whole sequence. The language model provides context to distinguish between words and phrases that sound similar. For example, in American English, the phrases 'recognize speech' and 'wreck a nice beach' sound similar, but mean different things. Data sparsity is a major problem in building language models. Most possible word sequences are not observed in training. One solution is to make the assumption that the probability of a word only depends on the previous n words. This is known as an n-gram model or unigram model when n = 1. The unigram model is also known as the bag of words model. Estimating the relative likelihood of different phrases is useful in many natural language processing applications, especially those that generate text as an output. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, Optical Character Recognition, handwriting recognition, information retrieval and other applications. In speech recognition, sounds are matched with word sequences. Ambiguities are easier to resolve when evidence from the language model is integrated with a pronunciation model and an acoustic model. Language models are used in information retrieval in the query likelihood model. There a separate language model is associated with each document in a collection. Documents are ranked based on the probability of the query Q in the document's language model P ( Q ∣ M d ) {displaystyle P(Qmid M_{d})} . Commonly, the unigram language model is used for this purpose. A unigram model can be treated as the combination of several one-state finite automata. It splits the probabilities of different terms in a context, e.g. from P ( t 1 t 2 t 3 ) = P ( t 1 ) P ( t 2 ∣ t 1 ) P ( t 3 ∣ t 1 t 2 ) {displaystyle P(t_{1}t_{2}t_{3})=P(t_{1})P(t_{2}mid t_{1})P(t_{3}mid t_{1}t_{2})} to P uni ( t 1 t 2 t 3 ) = P ( t 1 ) P ( t 2 ) P ( t 3 ) {displaystyle P_{ ext{uni}}(t_{1}t_{2}t_{3})=P(t_{1})P(t_{2})P(t_{3})} .