Indonesian Corpus Constructing and Text Processing for Speech Synthesis

2018 
This paper focused on the development of Indonesian speech synthesis system, and it studied Indonesian text analysis and processing methods. It mainly studied Indonesian pronunciation corpus selection, text normalization and syllable division methods. Using the principle of combination of high frequency words and sentence length, we selected 5000 sentences as pronunciation corpus from a 566MB original text corpus. By using a combination of regular expressions and keywords, the numbers in the text are normalized. Furthermore, a combination of syllable lists and special rules are used to achieve syllable segmentation. The experimental results show that the above proposed methods laid a good foundation for the development of the Indonesian speech synthesis system.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    1
    Citations
    NaN
    KQI
    []