Speech naturalness improvement via \(\mathrm {\epsilon }\)-closed extended vectors sets in voice conversion systems

2018 
In conventional voice conversion methods, some features of a speech signal’s spectrum envelope are first extracted. Then, these features are converted so as to best match a target speaker’s speech by designing and using a set of conversions. Ultimately, the spectrum envelope of the target speaker’s speech signal is reconstructed from the converted features. The spectrum envelope reconstructed from the converted features usually deviates from its natural form. This aberration from the natural form observed in cases such as over-smoothing, over-fitting, and widening of formants is partially caused by two factors: (1) there is an error in the reconstruction of spectrum envelope from the features, and (2) the set of features extracted from the spectrum envelope of the speech signal is not closed. A method is put forward to improve the naturalness of speech by means of \(\epsilon \)-closed sets of extended vectors in voice conversion systems. In this approach, \(\epsilon \)-closed sets to reconstruct the natural spectrum envelope of a signal in the synthesis phase are introduced. The elements of these sets are generated by forming a group of extended vectors of features and applying a quantization scheme on the features of a speech signal. The use of this method in speech synthesis leads to a noticeable reduction of error in spectrum reconstruction from the features. Furthermore, the final spectrum envelope extracted from voice conversions maintains its natural form and, consequently, the problems arising from the deviation of voice from its natural state are resolved. The above method can be generally used as one phase of speech synthesis. It is independent of the voice conversion technique used and its parallel or non-parallel training method, and can be applied to improve the naturalness of the generated speech signal in all common voice conversion methods. Moreover, this method can be used in other fields of speech processing like texts to speech systems and vocoders to improve the quality of the output signal in the synthesis step.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    1
    Citations
    NaN
    KQI
    []