Fast Spectrogram Inversion Using Multi-Head Convolutional Neural Networks

Sercan O. Arik,Heewoo Jun,Gregory Diamos

Fast Spectrogram Inversion Using Multi-Head Convolutional Neural Networks

2019

Sercan O. Arik
Heewoo Jun
Gregory Diamos

We propose the multi-head convolutional neural network (MCNN) for waveform synthesis from spectrograms. Nonlinear interpolation in MCNN is employed with transposed convolution layers in parallel heads. MCNN enables significantly better utilization of modern multi-core processors than commonly used iterative algorithms like Griffin–Lim, and yields very fast (more than 300 × real time) runtime. For training of MCNN, we use a large-scale speech recognition dataset and losses defined on waveforms that are related to perceptual audio quality. We demonstrate that MCNN constitutes a very promising approach for high-quality speech synthesis, without any iterative algorithms or autoregression in computations.

Keywords:

Pattern recognition
Spectrogram
Autoregressive model
Convolution
Speech synthesis
Artificial intelligence
Waveform
Time–frequency analysis
Convolutional neural network
Computer science
Sound quality

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations