Augmenting Dysphonia Voice Using Fourier-based Synchrosqueezing Transform for a CNN Classifier

2019 
The challenge of dysphonia voice studies is always the small dataset. It is difficult to apply more sophisticated deep learning techniques without overfitting or underfitting. Convolutional neural network (CNN) is a powerful classifier that requires a large amount of training data. Data augmentation techniques for voice are limited. Fourier-based synchrosqueezing transform (FSST) can be used as a data augmentation technique to increase the data size. The results indicated that not only can FSST increase the data size, the CNN can also learn better with FSST than with Short-Time Fourier Transform (STFT) power spectrum. The loss function for FSST converges, but not for STFT. FSST is also more stable and provides more accurate results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    5
    Citations
    NaN
    KQI
    []