Combinaison de modèles de langage pour l'identification de thèmes

Brigitte Bigi,Renato De Mori,Marc El-Bèze,Thierry Spriet

Combinaison de modèles de langage pour l'identification de thèmes

1998

Brigitte Bigi
Renato De Mori
Marc El-Bèze
Thierry Spriet

A new statistical method for Language Modeling and spoken document classification is proposed. It is based on a mixture of topic dependent probabilities. Each topic dependent probability is in turn a mixture of n-gram probabilities and the probability of Kullback-Lieber (KL) distances between key-word unigrams and distribution obtained from the content of a cache memory. Experimental result on topic classification using a corpus of 60 Mwords from the French newspaper Le Monde show the excellent performance of the cache memory and its complementary role in providing different statistics for the decision process.

Keywords:

Speech recognition
Natural language processing
Document classification
Language model
CPU cache
Artificial intelligence
Computer science
Newspaper
decision process

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations