Blind separation of underdetermined Convolutive speech mixtures by time–frequency masking with the reduction of musical noise of separated signals

2021 
The main focus of this paper is the separation of underdetermined convolutive blind speech in a multi-speaker environment. We present a method based on mask prediction in the time-frequency domain. Firstly, depending on the sparsity of signals in the time-frequency (TF) domain, we extimate speakers’ masks by clustering the relative absolute and Hermitian angle features extracted from the frequency components of the mixtures. Speech separation algorithms that are based on the sparsity and disjoint orthogonality of the speech signals in the time-frequency domain are not efficient when more than one source is active. Hence, in this paper, the cluster centers are estimated mostly based on the TF units that probably have only one active source. The correlations between the estimated masks, belonging to adjacent frequency bins, are leveraged to solve the permutation problem. To increase the accuracy, we have zeroed the value of masks at the TF unit without any active source. Moreover, in clustering, we employ a weighting function to consider the parts of masks that probably contains just one active source. Finally, in order to decrease the musical noise of the separated signals and improve their quality, sparse filters in the time-domain are utilized to re-estimate the separated signals. Performance of the proposed method is evaluated by a number of simulated and real speech signals. The simulated experiments were performed using a public dataset and Roomsim simulator. Compared the proposed method with some conventional algorithms, we observed that our separation method is more accurate than other approaches.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []