Performance Improvement of Prosody-ContRolled Voice Conversion by Language Model Adaptation

2019 
In this paper, a voice conversion method that uses speech recognition and synthesis together has been studied. In this system, the emotional speech of a target speaker can be produced using the prosody of an input speaker. To obtain high-quality speech with this system, high-accuracy speech recognition is required. However, it is difficult to accurately recognize emotional speech. Therefore, it is necessary to improve the accuracy of acoustic and language models. In this work, we analyzed language models and attempted to improve their accuracy. For this purpose, we used a language model adaptation method. To confirm the effectiveness of the proposed method, we conducted emotional speech recognition and voice conversion experiments.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    0
    Citations
    NaN
    KQI
    []