Utilizing Indonesian Allophones and Intraword Short Pauses Handling to Improve Performance of Indonesian Text-To-Speech

2018 
An allophone is a phoneme variant based on the position within a word, for instance, the first phoneme $e$ in “pendekar” is pronounced differently from the second phoneme e. According to Badan Pengembangan dan Pembinaan Bahasa (Language Development and Fostering Agency), formerly Pusat Bahasa (Language Center), Bahasa Indonesia has 5 vowels and 22 consonants, with 6 of them have allophones. There are only allophones of a phoneme (e)that can change the meaning of a word, while allophones of other 5 phonemes are not changing words' meanings. Therefore, most researches/projects on developing an Indonesian text-to-speech (TTS)system focus only on allophones of the phoneme e. This paper proposes a method to utilize all allophones of Bahasa Indonesia in developing a model for an Indonesian TTS system with a deep neural network (DNN)method. Furthermore, intraword short pause is also implemented to improve intelligibility and naturalness aspects. A set of rules are introduced to automatically detect allophones and intraword short pauses in the text corpus used in recording audio data. Using subjective and objective evaluations, the resulted TTS model shows a better result compared to one that not using allophones and intraword short pauses.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    2
    Citations
    NaN
    KQI
    []