Speech Enhancement System Using Lip-reading

Kenji Matsui,Kohei Fukuyama,Yoshihisa Nakatoh,Yumiko O. Kato

Speech Enhancement System Using Lip-reading

2020

We have been developing a practical speech enhancement system that supports for laryngectomee. By interviewing users we captured essential issues, such as “utilization of existing device”, “the appearance needs to be inconspicuous”, and “the device should be easy to use”. Considering those user's needs, we plan to use smart phone platform and develop speech enhancement application so that the users are just ordinary looking, and there is no need to buy any additional device. In order to realize such system, the key concept of our proposed system performs lip-reading and speech synthesis. In this study, we examined a lip-reading method that can recognize by registering the words that you want to speak and that is optimized for the user using a small amount of data. 36 viseme images were converted into very small data using VAE(Variational Auto Encoder), then the training data for word recognition model was generated. Viseme is a group of phonemes with identical appearance on the lips. Our viseme sequence representation with VAE was used to be able to adapt users with very small amount of training data set. Word recognition experiment using VAE encoder and CNN was performed with 20 Japanese words. The experimental result showed 65% recognition accuracy, and 100% including 1st and 2nd candidates. The lip-reading type speech enhancement seems appropriate for embedding mobile devices in consideration of both usability and small vocabulary recognition accuracy.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations