Indonesian Corpus Constructing and Text Processing for Speech Synthesis

Xuan Kong,Jian Yang

Indonesian Corpus Constructing and Text Processing for Speech Synthesis

2018

Xuan Kong
Jian Yang

This paper focused on the development of Indonesian speech synthesis system, and it studied Indonesian text analysis and processing methods. It mainly studied Indonesian pronunciation corpus selection, text normalization and syllable division methods. Using the principle of combination of high frequency words and sentence length, we selected 5000 sentences as pronunciation corpus from a 566MB original text corpus. By using a combination of regular expressions and keywords, the numbers in the text are normalized. Furthermore, a combination of syllable lists and special rules are used to achieve syllable segmentation. The experimental results show that the above proposed methods laid a good foundation for the development of the Indonesian speech synthesis system.

Keywords:

Artificial intelligence
Natural language processing
Text normalization
Syllable
Text corpus
Regular expression
Speech synthesis
Text processing
Pronunciation
Computer science
Sentence
Indonesian

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations