Prediction of pronunciation variations for speech synthesis: a data-driven approach

Christina L. Bennett,Alan W. Black

Prediction of pronunciation variations for speech synthesis: a data-driven approach

2005

The fact that speakers vary pronunciations of the same word within their own speech is well known, but little has been done to categorize and predict a speaker's pronunciation distribution automatically for unit selection speech synthesis. Recent work demonstrated how to identify automatically a speaker's choice between full and reduced pronunciations using acoustic modeling techniques from speech recognition. We extend this approach and show how its results can be used to predict a speaker's choice of pronunciations for synthesis. We apply machine learning techniques to the automatically categorized data to produce a pronunciation variation prediction model given only the utterance text - allowing the system to synthesize novel phrases with variations like those the speaker would make. Empirical studies emphasize that we can improve automatic pronunciation labels and successfully utilize the results for prediction of future synthesized examples. The prediction results based on these automatic labels are very similar to those trained from human labeled data - allowing us to reduce manual effort while still achieving comparable results.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations