A Novel Encoder-Decoder Model via NS-LSTM Used for Bone-Conducted Speech Enhancement

2018 
Bone-conducted (BC) speech can be used to communicate in a very high noise environment. In this paper, a method of improving the quality of BC speech is presented. The speech signal of a speaker is passed through a novel dictionary representation-based encoder-decoder model. In the encoder, our designed non-negative and sparse long short-term memory (LSTM) recurrent neural network is deployed to generate combination coefficients on the dictionary established by sparse non-negative matrix factorization. Then, the decoder is designed and utilized to enhance the dictionary representation based on local attention mechanism. Two optimizers are adopted when training the model as a whole and the encoder is pre-trained individually to make the convergence faster. In experiments, we compare the proposed method with direct transformations via DNN and LSTM networks, and numerous criteria are used for evaluation. Objective and subjective results demonstrate that our method behaves better and achieves satisfactory performance even when coping with some challenging cases.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    9
    Citations
    NaN
    KQI
    []