Determining the adaptation data saturation of ASR systems for dysarthric speakers

2021 
Automatic speech recognition (ASR) systems are gradually accepted as the assistive technology for the physically impaired individuals such as speakers with dysarthria. Dysarthria is a motor speech impairment, where the muscles related to speech organs are weak, causing slow or no movement of the muscles. It is often accompanied by neurological conditions such as cerebral palsy, head injury, muscles dystrophy and multiple sclerosis. Using the ASR system to understand the spoken language of a speaker with dysarthia came with many advantages as compared to the conventional keyboard and mouse method. However, the development of an effective ASR system for this condition often limited by data sparsity in terms of coverage of the language or the size of the speech databases. To overcome the data sparsity issues, existing researchers proposed several solutions including the adaptation techniques such as MLLR and MAP. In this study, two types of adaptation techniques were considered, which includes the individual MLLR and MAP adaptation technique, as well as the combined adaptation technique (MLLR + MAP sequence, and MAP + MLLR sequence) to determine the saturation point of the adaptation data of dysarthric speech. The saturation point is identified using linear regression between the data size and the recognition accuracy. The results show that the saturation points are different for both individual MLLR and MAP adaptation technique, while the sequence of the combined adaptation technique influences the saturation points.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    45
    References
    1
    Citations
    NaN
    KQI
    []