Mixing HMM-based spanish speech synthesis with a CBR for prosody estimation

Xavi Gonzalvo,Ignasi Iriondo,Joan-Claudi Socoró,Francesc Alías,Carlos Monzo

Mixing HMM-based spanish speech synthesis with a CBR for prosody estimation

2007

Xavi Gonzalvo
Ignasi Iriondo
Joan-Claudi Socoró
Francesc Alías
Carlos Monzo

Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is a technique for generating speech from trained statistical models where spectrum, pitch and durations of basic speech units are modelled altogether. The aim of this work is to describe a Spanish HMMTTS system using an external machine learning technique to help improving the expressiveness. System performance is analysed objectively and subjectively. The experiments were conducted on a reliably labelled speech corpus, whose units were clustered using contextual factors based on the Spanish language. The results show that the CBR-based F0 estimation is capable of improving the HMM-based baseline performance when synthesizing non-declarative short sentences while the durations accuracy is similar with the CBR or the HMM system.

Keywords:

Speech recognition
Speech synthesis
Expressivity
Pattern recognition
Case-based reasoning
Artificial intelligence
Prosody
Statistical model
Computer science
Speech corpus
Hidden Markov model
spanish language
Natural language processing

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations