Deep neural network Based Low-latency Speech Separation with Asymmetric analysis-Synthesis Window Pair.

Shanshan Wang,Gaurav Naithani,Archontis Politis,Tuomas Virtanen

Deep neural network Based Low-latency Speech Separation with Asymmetric analysis-Synthesis Window Pair.

2021

Shanshan Wang
Gaurav Naithani
Archontis Politis
Tuomas Virtanen

Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation. In this paper, we propose the usage of an asymmetric analysis-synthesis window pair which allows for training with targets with better frequency resolution, while retaining the low-latency during inference suitable for real-time speech enhancement or assisted hearing applications. In order to assess our approach across various model types and datasets, we evaluate it with both speaker-independent deep clustering (DC) model and a speaker-dependent mask inference (MI) model. We report an improvement in separation performance of up to 1.5 dB in terms of source-to-distortion ratio (SDR) while maintaining an algorithmic latency of 8 ms.

Keywords:

Artificial intelligence
Source separation
Latency (engineering)
Cluster analysis
Artificial neural network
Latency (audio)
Computer science
Speech enhancement
Inference
Pattern recognition
Masking (art)

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations