Multi-scale Generative Adversarial Networks for Speech Enhancement

2019 
The generative adversarial networks can be used to recognize and eliminate noise from noisy speech after extensive training. The most representative model is Speech Enhancement Generative Adversarial Network (SEGAN). However, eliminating the noise without distortion is still a challenging task especially in a low SNR environment. To solve such problems, this paper proposes Speech Enhancement Multi-scale Generative Adversarial Networks (SEMGAN), whose generator and discriminator networks are structured on the basis of fully convolutional neural networks (FCNNs). Compared with SEGAN, the generator generates speeches in three different dimensions and makes multiple judgments in the discriminator. In addition, multiple types of noise and signal-noise ratios (SNRs) are used to train our model for improving the generalization capability. In the stage of testing, we further propose pre- SEMGAN, which solve the problem that the last frame of speech data was not processed well. As the experimental results indicated, the architecture (SEMGAN and pre- SEMGAN) proposed gain a superior performance in comparison with the optimally modified log-spectral amplitude estimator (OMLSA) and SEGAN in different noisy conditions. It is worth mentioning that SEMGAN's PESQ and STOI score increase about 7% and 3.6% over SEGAN respectively in the case of 2.5 dB SNR.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    0
    Citations
    NaN
    KQI
    []