Efficient Adversarial Audio Synthesis VIA Progressive Upsampling

Youngwoo Cho,Minwook Chang,Sanghyeon Lee,Hyoungwoo Lee,Gerard J. Kim,Jaegul Choo

Efficient Adversarial Audio Synthesis VIA Progressive Upsampling

2021

Youngwoo Cho
Minwook Chang
Sanghyeon Lee
Hyoungwoo Lee
Gerard J. Kim
Jaegul Choo

This paper proposes a novel generative model called PUGAN, which progressively synthesizes high-quality audio in a raw waveform. Progressive upsampling GAN (PUGAN) leverages the progressive generation of higher-resolution output by stacking multiple encoder-decoder architectures. Compared to an existing state-of-the-art model called WaveGAN, which uses a single decoder architecture, our model generates audio signals and converts them to a higher resolution in a progressive manner, while using a significantly smaller number of parameters, e.g., 3.17x smaller for 16 kHz output, than WaveGAN. Our experiments show that the audio signals can be generated in real time with a comparable quality to that of WaveGAN in terms of the inception scores and human perception.

Keywords:

Speech recognition
Adversarial system
Upsampling
audio synthesis
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations