A Novel Encoder-Decoder Model via NS-LSTM Used for Bone-Conducted Speech Enhancement
2018
Bone-conducted (BC) speech can be used to communicate in a very high noise environment. In this paper, a method of improving the quality of BC speech is presented. The speech signal of a speaker is passed through a novel dictionary representation-based encoder-decoder model. In the encoder, our designed non-negative and sparse long short-term memory (LSTM) recurrent neural network is deployed to generate combination coefficients on the dictionary established by sparse non-negative matrix factorization. Then, the decoder is designed and utilized to enhance the dictionary representation based on local attention mechanism. Two optimizers are adopted when training the model as a whole and the encoder is pre-trained individually to make the convergence faster. In experiments, we compare the proposed method with direct transformations via DNN and LSTM networks, and numerous criteria are used for evaluation. Objective and subjective results demonstrate that our method behaves better and achieves satisfactory performance even when coping with some challenging cases.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
29
References
9
Citations
NaN
KQI