Evaluating and Correcting Phoneme Segmentation for Unit Selection Synthesis

John Kominek,Christina L. Bennett,Alan W. Black

Evaluating and Correcting Phoneme Segmentation for Unit Selection Synthesis

2003

John Kominek
Christina L. Bennett
Alan W. Black

As part of improved support for building unit selection voices, the Festival speech synthesis system now includes two algorithms for automatic labeling of wavefile data. The two methods are based on dynamic time warping and HMM-based acoustic modeling. Our experiments show that DTW is more accurate 70% of the time, but is also more prone to gross labeling errors. HMM modeling exhibits a systematic bias of 15 ms. Combining both methods directs human labelers towards data most likely to be problematic.

Keywords:

Speech recognition
Pattern recognition
Speech synthesis
Dynamic time warping
Machine learning
Computer science
Segmentation
Hidden Markov model
Artificial intelligence
building unit

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations