Localized Mandarin Speech Synthesis Services for Enterprise Scenarios

2018 
Speech interaction systems have been gaining popularity in recent years. For these systems, the performance of speech synthesis has become a key factor to determine quality of service (QoS) and user experience in real-world speech interaction systems. How to improve the efficiency of speech synthesis has become a hot topic and represents one of the main streams in specific scenarios of human-computer interactions. In this paper, we propose a low-latency hidden Markov model (HMM)-based localized Mandarin speech synthesis architecture which uses a shared global variance for all the Gaussian mixture models (GMMs). Through this strategy, the memory consumption for loading the acoustic model has been reduced greatly. We also encapsulate the speech synthesis as a service using epoll mechanism so that the synthesis engine can be initialized by preloading the text analysis model and acoustic model, and can be invoked by multiple processes simultaneously, thus further improving the efficiency of speech synthesis. Experimental results demonstrate that our proposed method can significantly reduce the time latency while maintaining voice quality of synthesized speeches.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    0
    Citations
    NaN
    KQI
    []