Multi-Microphone Neural Speech Separation for Far-Field Multi-Talker Speech Recognition

2018 
This paper describes a neural network approach to far-field speech separation using multiple microphones. Our proposed approach is speaker-independent and can learn to implicitly figure out the number of speakers constituting an input speech mixture. This is realized by utilizing the permutation invariant training (PIT) framework, which was recently proposed for single-microphone speech separation. In this paper, PIT is extended to effectively leverage multi-microphone input. It is also combined with beamforming for better recognition accuracy. The effectiveness of the proposed approach is investigated by multi-talker speech recognition experiments that use a large quantity of training data and encompass a range of mixing conditions. Our multi-microphone speech separation system significantly outperforms the single-microphone PIT. Several aspects of the proposed approach are experimentally investigated.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    47
    Citations
    NaN
    KQI
    []