Performance Improvement of Prosody-ContRolled Voice Conversion by Language Model Adaptation

Kazuya Saeki,Masaharu Kato,Tetsuo Kosaka

Performance Improvement of Prosody-ContRolled Voice Conversion by Language Model Adaptation

2019

Kazuya Saeki
Masaharu Kato
Tetsuo Kosaka

In this paper, a voice conversion method that uses speech recognition and synthesis together has been studied. In this system, the emotional speech of a target speaker can be produced using the prosody of an input speaker. To obtain high-quality speech with this system, high-accuracy speech recognition is required. However, it is difficult to accurately recognize emotional speech. Therefore, it is necessary to improve the accuracy of acoustic and language models. In this work, we analyzed language models and attempted to improve their accuracy. For this purpose, we used a language model adaptation method. To confirm the effectiveness of the proposed method, we conducted emotional speech recognition and voice conversion experiments.

Keywords:

Performance improvement
Prosody
Speech recognition
Language model
Computer science
adaptation method
Speech synthesis
conversion method

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations