Cross Audio Visual Recognition – Lip Reading

2019 
Lip reading is the task of decoding text from movement of a speaker’s mouth. There are two stages in the   task namely the designing or learning the visual features and prediction. It learns the spatiotemporal visual features and the sequence model. The three dominant models that are being utilized to design the Lip reading is Convolution Neural Networks (CNN), LSTM’s and Reinforcement learning. The one to many relationships between the visemes and phoneme creates issues in the predicting the phrases and words. The 3-D convolutional model is used for the cross audio-video recognition. The new technologies are being utilized to improve the way of communication with the deaf people, so this project deals with the collecting of the random videos which might be noisy or with low quality audio and map to the words and sentences. The project is being retrieved from the applications of the 3-D Convolutional Neural Networks Reinforcement learning.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []