Improving the performance of Letter-To-Phoneme conversion by using Two-Stage Neural Network

2012 
In Text-To-Speech (TTS), Letter-To-Phoneme (L2P) conversion is one of the most important tasks, which allows converting automatically from arbitrary text into the corresponding phoneme sequence. According to the existing researches, the performance is already quite good for the in-vocabulary words, but not for the Out-Of-Vocabulary (OOV) words. For improving the performance of L2P conversion on OOV words, this paper focuses on two different issues. The first issue concerns the unknown relationship between letter and phoneme, while the second one is related to a specific difficult problem where a letter sequence could correspond to another phoneme sequence in the same context. Therefore, we introduce a L2P conversion based on two-stage neural network approach focusing on both letter and phoneme contexts. The first-stage neural network is implemented as a many-to-many mapping model between letters and phonemes for solving the first issue, while the second-stage neural network aims to deal with the second problem by extending the context information at the phonemic level in order to generate a pattern of phonemes that could be easily recognized by the neural network. As a result, based on the auto aligned CMU corpora (1), it is proved that our proposed approach could provide a high performance in terms of Phoneme Accuracy (PAcc) and Word Accuracy (WAcc) on the OOV words.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    2
    Citations
    NaN
    KQI
    []