Deep Attractor with Convolutional Network for Monaural Speech Separation

2020 
Deep attractor network (DANet) is a recent deep learning-based method for monaural speech separation. The idea is to map the time-frequency bins from the spectrogram to the embedding space and form attractors for each source to estimate masks. The original deep attractor network uses true assignments of speaker to form attractors during training, but K-means algorithm or fixed attractor method is used during the test phase to estimate attractors. The fixed attractor method does not perform well when training and test condition is different. Using K-means algorithm during test raises a center mismatch problem, which leads to performance degradation. In this letter, we propose to use convolutional networks for estimating attractors in the training and test phases. By using the same method to generate attractors, the center mismatch problem is solved. Results revealed that the proposed method achieves better performance than DANet using K-means method and gets comparable performance with DANet using ideal binary mask during test with limited training data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []