Noise RETF Estimation and Removal for Low SNR Speech Enhancement

2021 
A method for offline two-microphone speech enhancement in highly adverse noisy environments with signal-to-noise (SNR) ratios of −10 to −20 dB is proposed. While the topic of speech enhancement is well researched, there are very few methods developed to address such significant noise conditions. Specifically, we are interested in removing noise from unintelligible recordings such that the resulting denoised speech content is understandable to human listeners. We propose exploiting the Relative Transfer Function (ReTF), a spatial feature of the noise source in a speech enhancement algorithm. We model the noise source ReTF with a time-domain machine learning structure to estimate and subtract the noise signal from the mixture. Both a linear filtering and an autoen-coder based structure are proposed. For a single interfering noise source, speech intelligibility is improved to within 9% below the Short-Time Objective Intelligibility (STOI) score of the benchmark oracle Ideal Binary Mask (IBM).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    0
    Citations
    NaN
    KQI
    []