Cross Audio Visual Recognition – Lip Reading

Varsha C Bendre,Prabhat Kumar Singh,Rohit Anand,Mayuri K P

Cross Audio Visual Recognition – Lip Reading

2019

Lip reading is the task of decoding text from movement of a speaker’s mouth. There are two stages in the task namely the designing or learning the visual features and prediction. It learns the spatiotemporal visual features and the sequence model. The three dominant models that are being utilized to design the Lip reading is Convolution Neural Networks (CNN), LSTM’s and Reinforcement learning. The one to many relationships between the visemes and phoneme creates issues in the predicting the phrases and words. The 3-D convolutional model is used for the cross audio-video recognition. The new technologies are being utilized to improve the way of communication with the deaf people, so this project deals with the collecting of the random videos which might be noisy or with low quality audio and map to the words and sentences. The project is being retrieved from the applications of the 3-D Convolutional Neural Networks Reinforcement learning.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations