Augmenting Dysphonia Voice Using Fourier-based Synchrosqueezing Transform for a CNN Classifier

Alice Rueda,Sridhar Krishnan

Augmenting Dysphonia Voice Using Fourier-based Synchrosqueezing Transform for a CNN Classifier

2019

Alice Rueda
Sridhar Krishnan

The challenge of dysphonia voice studies is always the small dataset. It is difficult to apply more sophisticated deep learning techniques without overfitting or underfitting. Convolutional neural network (CNN) is a powerful classifier that requires a large amount of training data. Data augmentation techniques for voice are limited. Fourier-based synchrosqueezing transform (FSST) can be used as a data augmentation technique to increase the data size. The results indicated that not only can FSST increase the data size, the CNN can also learn better with FSST than with Short-Time Fourier Transform (STFT) power spectrum. The loss function for FSST converges, but not for STFT. FSST is also more stable and provides more accurate results.

Keywords:

Artificial intelligence
Classifier (linguistics)
Pattern recognition
Computer science
Fourier transform
Spectrogram
Overfitting
Short-time Fourier transform
Convolutional neural network
Convolution
Time–frequency analysis
Deep learning

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations