Approach toward speech-to-speech translation system by using a collection of sentences and utterances

2003 
Corpus-based technology is very promising for speech-to-speech translation. However, the problem is that it is prohibitively expensive to build the vital resource, a large-scale corpus of bilingual dialogues covering many domains. We propose to substitute a combination of two different types of bilingual corpora: (1) a large-scale collection of basic sentences that covers many domains; and (2) a small-scale collection of spoken dialogues that reflects the characteristics of the spoken utterances for the large-scale corpus of dialogues. With these two corpora, we have been building a translation module for a speech-to-speech translation system. By using the basic sentence corpus, we have achieved high-quality translations with several machine-learning approaches. Based on an analysis of the spoken dialogue corpus, we found that splitting utterances into parts and concatenating the translated parts is an effective way to translate the longer utterances that are inherent in a spoken dialogue.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []