Combining statistical and linguistic models for synthesis of prosodic contours

1989 
“It is very important to get the timing, intonation, and allophonic detail correct in order that a sentence sound intelligible and moderately natural.” [D. Klatt, J. Acoust. Soc. Am. 82, 737–793 (1987)]. This important review article included prosody as a research issue for improving text‐to‐speech synthesis. Klan's suggestions for improving prosody are addressed here: Development of new systems for control of F0 and duration, and mechanisms for adding variety. The proposed synthesis system is a statistical model trained on text, parts of speech, pronunciation, lexical stress, prosodic labels (major and minor boundaries, accents, etc.), and acoustic parameters (relative F0 and duration). The synthesis problem is to predict the prosodic labels and acoustic parameters given the text and the statistical model. Several hours of speech have been collected from professional FM newscasters, a labeling scheme has been converged on, and a portion of the data has been labeled. The components of the system so far im...
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []