On the use of the i-vector speech representation for instrumental quality measurement

2020 
The i-vector framework has been widely used to summarize speaker-dependent information present in a speech signal. Considered the state-of-the-art in speaker verification for many years, its potential to estimate speech recording distortion/quality has been overlooked. This paper is an attempt to fill this gap. We conduct a detailed analysis of how distortions are captured in the total variability space. We then propose a full-reference speech quality model based on i-vector similarities and three no-reference approaches. The first no-reference approach makes use of a single reference i-vector based on the average of i-vectors extracted from clean signals. A second approach relies on a vector quantizer codebook of representative clean speech i-vectors. Lastly, i-vectors and subjective ratings were used to train a no-reference deep neural network model for speech quality assessment. Four experiments have shown that the proposed methods, based on the i-vector speech representation, are well-suited for assessing speech quality. Results show correlations with subjective quality judgments similar to those achieved with standardized instrumental algorithms, particularly for degradations caused by noise and reverberation.ϖ
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    41
    References
    0
    Citations
    NaN
    KQI
    []