Non-intrusive speech quality prediction based on the blind estimation of clean speech and the i-vector framework

2020 
Output-based instrumental speech quality assessment relies only on the received (processed) signal to predict quality. Such methods are called non-intrusive and are crucial in speech applications where reference clean signals are not accessible. In this paper, we propose a new non-intrusive instrumental quality measure based on the similarity between two i-vectors. As the reference clean signal is not available, the reference i-vector representation cannot be extracted directly from it. Therefore, we propose the use of a clean speech Gaussian mixture model to estimate the clean speech spectra from its degraded speech spectrum counterpart. Next, the two respective i-vector representations are extracted and either the cosine or Eucledian similarity metrics are computed as a correlate of speech quality. Here, the clean speech model is trained using RASTA-filtered mel-frequency cepstral coefficients extracted from a pool of clean speech files, thus allowing us to attain a model of clean spectrum characteristics. The proposed method is evaluated on noisy, reverberant, and enhanced speech conditions. Experimental results show the proposed system providing higher correlations with perceptual speech quality than several benchmark non-intrusive measures, especially for noisy and enhanced speech.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    46
    References
    0
    Citations
    NaN
    KQI
    []