Bayesian Feature Enhancement for ASR of Noisy Reverberant Real-World Data.

Alexander Krueger,Oliver Walter,Volker Leutnant,Reinhold Haeb-Umbach

Bayesian Feature Enhancement for ASR of Noisy Reverberant Real-World Data.

2012

Alexander Krueger
Oliver Walter
Volker Leutnant
Reinhold Haeb-Umbach

In this contribution we investigate the effectiveness of BAYESIAN feature enhancement (BFE) on a medium-sized recognition task containing real-world recordings of noisy reverberant speech. BFE employs a very coarse model of the acoustic impulse response (AIR) from the source to the microphone, which has been shown to be effective if the speech to be recognized has been generated by artificially convolving nonreverberant speech with a constant AIR. Here we demonstrate that the model is also appropriate to be used in feature enhancement of true recordings of noisy reverberant speech. On the Multi-Channel Wall Street Journal Audio Visual corpus (MCWSJ-AV) the word error rate is cut in half to 41.9% compared to the ETSI Standard Front-End using as input the signal of a single distant microphone with a single recognition pass.

Keywords:

Speech recognition
Impulse response
Word error rate
Artificial intelligence
Microphone
Bayesian probability
Pattern recognition
Computer science
real world data
audio visual

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations