An efficient voice activity detector in non-stationary noises incorporating evidence theory to combine multiple statistical models
2017
This paper presents a voice activity detector (VAD) by selecting the most appropriate statistical model of noisy speech spectral components. In the previous researches, it is shown that the power spectral flatness measure (PSFM) value implies which statistical model fits the best to the Discrete Cosine Transform (DCT) coefficients of noisy speech. We utilize the similar trend to evaluate the relation between the real/imaginary valued DFT coefficients of noisy speech in different stationary and non-stationary noise signals and SNRs. In the conducted experiments, the Kolmogorov-Smirnov (KS) test is employed to quantify the Goodness-of-Fit (GOF) of each parametric statistical model, which is Gaussian, Gamma and Laplacian probability density functions (PDFs). The likelihood ratio (LR) concluded from the selected statistical model is employed to calculate the smoothed LR (SLR) and multiple observation LR (MOLR). The final decision is made by utilizing Dempster-Shafer Theory (DST) to combine the three LR tests that are basic LRT, MO-LRT and SLRT. The usage of the proposed VAD is developed for non-stationary noise signals employing Gerkmann's noise power spectral density estimator. The experiments demonstrate that our proposed method outperforms the conventional VADs in various adverse conditions.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
16
References
0
Citations
NaN
KQI