Data Selection for Improving Naturalness of TTS Voices Trained on Small Found Corpuses

F. Y. Kuo,S Aryal,Gilles Degottex,S. Kang,Pierre Lanchantin,I. Ouyang

Data Selection for Improving Naturalness of TTS Voices Trained on Small Found Corpuses

2018

F. Y. Kuo
S Aryal
Gilles Degottex
S. Kang
Pierre Lanchantin
I. Ouyang

This work investigates techniques that select training data from small, found corpuses in order to improve the naturalness of synthesized text-to-speech voices. The approach outlined in this paper examines different metrics to detect and reject segments of training data that can degrade the performance of the system. We conducted experiments on two small datasets extracted from Mandarin Chinese audiobooks that have different characteristics in terms of recording conditions, narrator, and transcriptions. We show that using a even smaller, yet carefully selected, set of data can lead to a text-to-speech system able to generate more natural speech than a system trained on the complete dataset. Three metrics related to the narrator’s articulation proposed in the paper give significant improvements in naturalness.

Keywords:

Transcription (linguistics)
Computer science
Speech recognition
Training set
Mandarin Chinese
Naturalness
Hidden Markov model
data selection

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations