Fftnet: A Real-Time Speaker-Dependent Neural Vocoder

Zeyu Jin,Adam Finkelstein,Gautham J. Mysore,Jingwan Lu

Fftnet: A Real-Time Speaker-Dependent Neural Vocoder

2018

Zeyu Jin
Adam Finkelstein
Gautham J. Mysore
Jingwan Lu

We introduce FFTNet, a deep learning approach synthesizing audio waveforms. Our approach builds on the recent WaveNet project, which showed that it was possible to synthesize a natural sounding audio waveform directly from a deep convolutional neural network. FFTNet offers two improvements over WaveNet. First it is substantially faster, allowing for real-time synthesis of audio waveforms. Second, when used as a vocoder, the resulting speech sounds more natural, as measured via a “mean opinion score” test.

Keywords:

Microsoft Windows
Speech recognition
Deep learning
Convolutional neural network
Mean opinion score
Pattern recognition
Waveform
Convolution
Artificial intelligence
Computer science
speech sounds
Artificial neural network

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations