Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization

Yuma Ueda,Longbiao Wang,Atsuhiko Kai,Xiong Xiao,Eng Siong Chng,Haizhou Li

Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization

2014

In this paper, we propose a robust distant-talking speech recognition by combining cepstral domain denoising autoencoder (DAE) and temporal structure normalization (TSN) filter. For the proposed method, after applying a DAE in the cepstral domain of speech to suppress reverberation, we apply a post-processing technology based on temporal structure normalization (TSN) filter to reduce the noise and reverberation effects by normalizing the modulation spectra to reference spectra of clean speech. The proposed method was evaluated using speech in simulated and real reverberant environments. By combining a cepstral-domain DAE and TSN, the average Word Error Rate (WER) was reduced from 25.2% of the baseline system to 21.2% in simulated environments and from 47.5% to 41.3% in real environments, respectively.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations