Deep Learning End to End Speech Synthesis: A Review

2021 
Speech is a fundamental way of expressing ideas and thoughts. The ability to produce synthetic speech has always been a great interest to mankind. Text-to speech (TTS) or speech synthesis is an approach of producing an artificial speech in contrast to a given input text. Text-to-speech systems have immensely increased and improved in recent years. These advances in speech synthesis are contributed by deep learning techniques. The aim of this paper is to give an understanding regarding dynamics of research in this field and gives a brief introduction of traditional methods used in speech synthesis. Further, the use of various data sets for training deep learning TTS or Speech Synthesis Systems have also been discussed. Moreover, this paper emphasizes on deep learning based end-to-end speech synthesis models which have achieved spectacular achievement in terms of mean opinion score (MOS).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    0
    Citations
    NaN
    KQI
    []