Neural Synthesis of Binaural Speech

Alexander Richard,Dejan Markovic,Israel D. Gebru,Steven Krenn,Gladstone Alexander Butler,Fernando De la Torre,Yaser Sheikh

Neural Synthesis of Binaural Speech

2021

Alexander Richard
Dejan Markovic
Israel D. Gebru
Steven Krenn
Gladstone Alexander Butler
Fernando De la Torre
Yaser Sheikh

We present a neural rendering approach for binaural sound synthesis that can produce realistic and spatially accurate binaural sound in realtime. The network takes, as input, a single-channel audio source and synthesizes, as output, two-channel binaural sound, conditioned on the relative position and orientation of the listener with respect to the source. We investigate deficiencies of the l2-loss on raw waveforms in a theoretical analysis and introduce an improved loss that overcomes these limitations. In an empirical evaluation, we establish that our approach is the first to generate spatially accurate waveform outputs (as measured by real recordings) and outperforms existing approaches by a considerable margin, both quantitatively and in a perceptual study. We will release a first-of-its-kind binaural audio dataset as a benchmark for future research.

Keywords:

Computer science
Rendering (computer graphics)
speech generation
Binaural recording
sound spatialization
Speech recognition
perceptual study
Speech processing
Waveform

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations