Using ASR Posterior Probability and Acoustic Features for Voice Disorder Classification

2020 
Dysphonia can be caused not only by the frequent use voice, but many other reasons, including environmental noise, environmental pollution and dry environment. Dysphonia can serve as an indicator for several serious and less serious diseases. Therefore a system that models the cognitive decision making processes of an expert would be of great value in order to make reliable and quick decisions to help physicians in diagnosing dysphonia. This paper focuses on the front-end of such a system, and evaluates acoustic features measured in different phonetic classes and ASR posterior probability values in two classification model schemes, with SVM and a DNN, for the classification of healthy and disordered voices in Hungarian-speaking patients. When the combination of the two features is used the classification accuracy increases to 89 %. While this is better than just using ‘acoustic’ features as an input for the DNN (88 %), we did not find significant impact of using ASR posterior probability values. Based on our results, it can be concluded that it is not worthwhile to calculate ASR phone posterior, as it has no significant impact, but it can greatly complicate and slow down a diagnosis support system.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    1
    Citations
    NaN
    KQI
    []