Single-channel speech separation with non-negative matrix factorization and factorial conditional random fields

2017 
A new Non-negative matrix factorization(NMF) based algorithm is proposed for single-channel speech separation with a prior known speakers, which aims to better model the spectral structure and temporal continuity of speech signal. First, NMF and k-means clustering are employed to obtain multiple small dictionaries as well as a state sequence that describes the temporal dynamics between these dictionaries for each speaker.Then, a Factorial conditional random field(FCRF) model is trained using the state sequences and dictionaries to jointly model the temporal continuity of two speakers’ mixed signal for separation. Experiments show that the proposed algorithm outperforms the baselines with respect to all metrics, for example sparse NMF(+1.12 dB SDR, +2.37 dB SIR, +0.40 dB SAR, +0.2 MOS), nonnegative factorial hidden Markov model(+2.04 dB SDR,+4.26 dB SIR, +0.62 dB SAR, +1.0 MOS) and standard NMF(+2.8 dB SDR, +5.08 dB SIR, +1.06 dB SAR, +1.2 MOS).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    2
    Citations
    NaN
    KQI
    []