Thousands of Voices for HMM-based Speech Synthesis

Junichi Yamagishi,Bela Usabaev,Simon King,Oliver Watts,John Dines,Jilei Tian,Rile Hu,Yong Guan,Keiichiro Oura,Keiichi Tokuda,Reima Karhila,Mikko Kurimo

Thousands of Voices for HMM-based Speech Synthesis

2009

Our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an ‘average voice model’ plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack of phonetic balance. This enables us consider building high-quality voices on ’non-TTS’ corpora such as ASR corpora. Since ASR corpora generally include a large number of speakers, this leads to the possibility of producing an enormous number of voices automatically. In this paper we show thousands of voices for HMM-based speech synthesis that we have made from several popular ASR corpora such as the Wall Street Journal databases (WSJ0/WSJ1/WSJCAM0), Resource Management, Globalphone and Speecon. We report some perceptual evaluation results and outline the outstanding issues.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations