Comparison of language models trained on written texts and speech transcripts in the context of automatic speech recognition

Sebastian Dziadzio,Aleksandra Nabożny,Aleksander Smywiński-Pohl,Bartosz Ziółko

Comparison of language models trained on written texts and speech transcripts in the context of automatic speech recognition

2015

Sebastian Dziadzio
Aleksandra Nabożny
Aleksander Smywiński-Pohl
Bartosz Ziółko

We investigate whether language models used in automatic speech recognition (ASR) should be trained on speech transcripts rather than on written texts. By calculating log-likelihood statistic for part-of-speech (POS) n-grams, we show that there are significant differences between written texts and speech transcripts. We also test the performance of language models trained on speech transcripts and written texts in ASR and show that using the former results in greater word error reduction rates (WERR), even if the model is trained on much smaller corpora. For our experiments we used the manually labeled one million subcorpus of the National Corpus of Polish and an HTK acoustic model.

Keywords:

VoxForge
Cued speech
Factored language model
Natural language processing
Audio mining
Acoustic model
Language model
Speech synthesis
Computer science
Speech recognition
Speech corpus
Artificial intelligence
Speech technology

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations