Sensing to Hear: Speech Enhancement for Mobile Devices Using Acoustic Signals
2021
Voice interactions and voice messages on mobile phones are rapidly growing in popularity. However, the user experience of these services is still worse than desired in noisy environments, especially in multi-talker scenarios, where the phone can only provide low-quality voice recordings. Speech enhancement using only audio as the input remains a grand challenge in these scenarios. In this paper, we handle this with the help of the emerging acoustic sensing technology. The key insight is that the inaudible acoustic signals emitted by speakers of phones can capture the subtle lip movements when people speak. Instead of enabling lip reading for the classification of limited voice commands, we further unlock the potential of acoustic sensing and leverage the captured lip information to improve the voice recording quality. We propose WaveVoice, a joint audio-sensory deep learning method for end-to-end speech enhancement on mobile phones. The model of WaveVoice is structured as an encoder-decoder network, in which audio and acoustic sensing data are processed through two individual CNN branches, respectively, and then fused into a joint network to generate enhanced speech. In addition, to improve the performance on new users, a self-supervised learning methodology is developed to adapt the model to extract speaker-specific features. We construct a dataset to train and evaluate WaveVoice. We also perform online tests under various noisy conditions to show the applicability of our system in real-world scenarios. Experimental results show that WaveVoice can effectively reconstruct the target clean speech from the noisy audio signals, and yield notably superior performance compared with the audio-only encoder-decoder model and the state-of-the-art speech enhancement methods. Given its promising performance, we believe that WaveVoice has made a substantial contribution to the advancement of mobile voice input.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
64
References
2
Citations
NaN
KQI