CLUSTER ADAPTIVE TRAINING OF AVERAGE VOICE MODELS

Vincent Ping Leung Wan,Javier Latorre,Kayoko Yanagisawa,Mark J. F. Gales,Yannis Stylianou

CLUSTER ADAPTIVE TRAINING OF AVERAGE VOICE MODELS

2014

Vincent Ping Leung Wan
Javier Latorre
Kayoko Yanagisawa
Mark J. F. Gales
Yannis Stylianou

Hidden Markov model based text-to-speech systems may be adapted so that the synthesised speech sounds like a particular person. The average voice model (AVM) approach uses linear transforms to achieve this while multiple decision tree cluster adaptive training (CAT) represents different speakers as points in a low dimensional space. This paper describes a novel combination of CAT and AVM for modelling speakers. CAT yields higher quality synthetic speech than AVMs but AVMs model the target speaker better. The resulting combination may be interpreted as a more powerful version of the AVM. Results show that the combination achieves better target speaker similarity when compared with both AVM and CAT while the speech quality is in-between AVM and CAT.

Keywords:

Decision tree
Pattern recognition
Artificial intelligence
Computer science
Hidden Markov model
speech quality
Speech recognition
speech sounds

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations