Classification of the Mask Augsburg Speech Corpus (MASC) Using the Consistency Learning Method

2020 
This paper presents the details of our solution for the mask sub-task of the INTERSPEECH 2020 Computational Para-linguistics Challenge (ComParE). The speech production can be significantly affected when the speaker wears a face mask. The task evaluates the systems for the classification of speech recordings with and without a surgical mask. A student-teacher deep-learning neural network is proposed inspired by the wellperformed consistency learning method on a lot of classification problems. In particular, the consistency regularization term is designed between out-puts of the student model and the guided teacher model. Different level Gaussian noises are respectively added into the inputs of the teacher model as model perturbations to optimize the system robustness. To take further advantage of the consistency learning, a small number of unlabeled evaluation data is utilized to be combined with the labeled data for the system training in a semi-supervised learning manner. Finally, the proposed system achieves an unweighted average recall up to 72.50% on the official evaluation dataset, increasing by 10% compared with the baseline result of 62.60%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []