FONETSKI PROBLEMI DIFONSKE SINTEZE HRVATSKOGA GOVORA

1998 
The term "speech synthesis" covers a series of procedures by means of which a virtually unlimited corpus of written language can be translated into quasi-spoken form, with as limited memory consumption as possible. One method of synthesis consists of stringing prerecorded, digitally stored, spoken segments. With this method, the basic problem is that of choosing the spoken segments (sentences, words, syllables, phonemes) so that they can he combined into the appropriate strings. The problems of coarticulation and interpolation can for the most part be solved by choosing the diphone as the basic element. A diphone is a segment of speech running from the middle of one allophone to the middle of the next (Fujimura et al., 1977). With this method natural transition is insured, discontinuity at connections between elements is avoided, and interpolation becomes unnecessary. A group of authors working in the phonetic laboratory of Faculte Politechnique de Mons (Dutoit el al., 1996a, 1996b) developed a program for diphonic synthesis that uses recordings of natural speech for its repertoire of diphones of a concrete language. The program is a result of work being clone on a project known as MBROLA, and it is available via Internet to potential users and compilers of diphone databases. The eventual goal of the project is to promote academic research in the field of speech synthesis, particularly research in the prosody of synthesized speech, in the years to come. This article describes the process of creating a diphone database for standard spoken Croatian. A diphone database would have to contain all possible transitions, because in continuous speech all the phonemes of a language can come into direct contact at the borders of words. For this reason, such a diphone database for Croatian would have to include transitions between all of the language 's 30 phonemes each phoneme with all of the others. At the same time, all of the language's possible allophones would also need to he realized; this would include those which are normally listed in dialectological descriptions, as well as those which neither the phonetician nor the speakers usually notice. Assuming that all sound changes (voicing, unvoicing, partial assimilation) are actually caused by phonetic context, the realization of allophones is insured by instructing the speaker not to articulate extra carefully. The spectograms in Figures 1 and 2 illustrate the principle of segmentation. The first shows the connection and the segmentation of clearly different phonetic segments /tb/, while the second shows the way in which diphone borders are determined in the case of a continuous transition from one similar phonetic segment to another /ae/. Special attention is given to describing the criteria for determining the borders of diphones of temporally structured sounds (affricates). More information on the MBROLA project may be found at this Internet site: http://tcts.fpms.ac.be/synthesis/mbrola.html.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []