Using existing written language analyzers in understanding natural spoken Finnish.

2001 
In this paper we consider the possible use of existing linguistic (mainly morphological) analyzers for written Finnish in order to create a system that uses speech as its interface. We also present means to enhance the usability of these analyzers in this respect. 1 USIX Interact−project In the USIX Interact project ("http://www.mlab.uiah.fi/interact/", Tekes−project 40691/00), which is mainly funded by the National Technology Agency, we are designing a general platform for systems which use a natural language interface in communicating with their users. The Interact project is a joint effort between the University of Art and Design Helsinki, the University of Helsinki, the Helsinki University of Technology and the University of Tampere. As a demonstration we are constructing a system which should be able to answer inquiries about the timetables of public transportation. The problem that we at the University of Helsinki are solving at the moment is the mapping of the utterance of the speaker into relevant semantic units which are in turn processed by the dialogue manager. After dialogue manager has done it’s processing it produces new semantic units, out of which it is our problem to generate natural language. We are also responsible for creating a dictionary for the speech recognition system. 2 Written vs. spoken Finnish It is a widely known fact that Finnish is written so that one grapheme corresponds to one phoneme. "Spoken as it is written". Much less known is the fact that spoken Finnish differs greatly from its written form. The current written form of Finnish was established to serve a compromise between all the Finnish dialects and as such it has never really been a good transliteration of any form of spoken Finnish. Given the current state of user independent speech recognition, we are forced to use full word lists which include all the morphological forms that we are trying to understand from the user’s utterance. Because of the difference between written and spoken Finnish, existing tools cannot handle the word forms used in natural spoken Finnish. In order to deal with the variation between the several different ways to pronounce any written word, we have to generate a list of the most probable pronunciations for each one. Then we map each one of these pronunciations to standard forms that can be found from the written language. These standard forms are then passed forward to any linguistic analyzers that are being used by the system. To do this we need a tool which takes as its input the written forms we decide are necessary and as its output gives the forms that can actually be found from spoken language.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    4
    Citations
    NaN
    KQI
    []