Statistical Formant Speech Synthesis for Arabic

2015 
This work constructs a hybrid system that integrates formant synthesis and context-dependent Hidden Semi-Markov Models (HSMM). HSMM parameters comprise of formants, fundamental frequency, voicing/frication amplitude, and duration. For HSMM training, formants, fundamental frequency, and voicing/frication amplitude are extracted from waveforms using the Snack toolbox and a decomposition algorithm, and duration is calculated using HMM modeled by multivariate Gaussian distribution. The acoustic features are then generated from the trained HSMM models and combined with default values of complementary acoustic features such as glottal waveform parameters to produce speech waveforms utilizing the Klatt synthesizer. We construct the text processor for phonetic transcription required at the training and synthesis phases by utilizing phonemic pronunciation algorithms. A perceptual test reveals that the statistical formant speech text-to-speech system produces good-quality speech while utilizing features that are small in dimension and close to speech perception cues.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    3
    Citations
    NaN
    KQI
    []