Towards Pooled-Speaker Concatenative Text-to-Speech

Ellen Eide,Michael Picheny

Towards Pooled-Speaker Concatenative Text-to-Speech

2006

Ellen Eide
Michael Picheny

In this paper we explore the merging of data from various speakers in building a concatenative text-to-speech system. First, we investigate the pooling of data from multiple speakers for building statistical models to predict pitch and duration, and present listening test results which show that the expressiveness of our TTS system is improved using these techniques. Additionally, we describe an experiment in which we merged databases from several speakers to form an enlarged database from which our concatenative text-to-speech system draws segments. We present listening test results which show that pooling data from several speakers yields higher quality synthetic speech in general domains than restricting ourselves to the data from just one speaker in our repertoire.

Keywords:

Active listening
Speech processing
Speech recognition
Repertoire
Artificial intelligence
Pooling
Speech corpus
Pattern recognition
Statistical model
Speech synthesis
Computer science
Text mining
Natural language processing
pooling data
Merge (version control)

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations