Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers
2015
This paper brings an improvement of voice activity detection, based on vector quantization and speech enhancement preprocessing (VQ-VAD) proposed recently, and applied to speaker verification system under noisy environment. VQ-VAD is based on computing the likelihood ratio on an utterance-by utterance basis from mel-frequency cepstral coefficients that train speech and non-speech models. Whereas the notion of speech and non-speech segments in speech signal is independent of the speaker. For this, a modified VQ-VAD technique is proposed in this paper, by creating two UBM's for speech and non-speech models, trained from a long utterance-independence model. Then, an adaptation of UBM's models to the short utterance of speaker is performed via MAP adaptation, instead of using VQ models. Mel-frequency cepstral coefficient's were also extracted by using the recently proposed asymmetric tapers instead of the traditional Hamming windowing. Using the GMM---UBM as a baseline system for speaker verification, extensive simulation results were done by adding different noise levels to the clean TIMIT database, characterized by its short training and very short testing utterances. The obtained results show the superiority of the proposed GMM-MAP-VAD approach in adverse conditions. Furthermore a drastic reduction in the EER is observed when using asymmetric tapers.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
23
References
7
Citations
NaN
KQI