MOS Naturalness and the Quest for Human-Like Speech

2018 
This paper reconsiders the use of MOS naturalness as an instrument for measuring the quality (vs. intelligibility) of speech. We reconsider an earlier proposed alternative, the paired comparison or “AB” test, and present new empirical evidence that this is indeed a better method for evaluating TTS quality. Using this, we evaluate three older TTS systems along with a recent deep-learning approach against native North-American and Indian speech and show that, in fact, TTS had already crossed the threshold of human-like speech synthesis some time ago. This suggests that a systematic reappraisal of the concept of abstract “naturalness” of speech is in order.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    2
    Citations
    NaN
    KQI
    []