Monaural Speech Segregation Using Signal Phase

2011 
An approach to segregate the target speech form the mixture utterance in low signal noise ratio (SNR) was proposed. Within the framework of computational auditory scene analysis (CASA), phase was the cue for segregation, and short time Fourier transforms (STFT) was used to extract the phase of the signal. Binary masking was used to group the target speech units based on the difference of phase between the mixture, clean speech and noise. The threshold of the binary masks was not linear. It adapted with the frequency change, and obtained from pretest. Experiments illustrated that the improvement of signal to noise ratio was more than 20dB in babble, m109, white and machinegun noise in -30dB to -20dB. The waveform of the result signal shown it remained most detail of the original signal, and had a well intelligibility. Phase is a robust cue in monaural speech segregation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    1
    Citations
    NaN
    KQI
    []