Multi-modal Language Models for Lecture Video Retrieval

Huizhong Chen,Matthew Cooper,Dhiraj Joshi,Bernd Girod

Multi-modal Language Models for Lecture Video Retrieval

2014

Huizhong Chen
Matthew Cooper
Dhiraj Joshi
Bernd Girod

We propose Multi-modal Language Models (MLMs), which adapt latent variable techniques for document analysis to exploring co-occurrence relationships in multi-modal data. In this paper, we focus on the application of MLMs to indexing text from slides and speech in lecture videos, and subsequently employ a multi-modal probabilistic ranking function for lecture video retrieval. The MLM achieves highly competitive results against well established retrieval methods such as the Vector Space Model and Probabilistic Latent Semantic Analysis. When noise is present in the data, retrieval performance with MLMs is shown to improve with the quality of the spoken text extracted from the video.

Keywords:

Machine learning
Computer vision
Information retrieval
Vector space model
Latent variable model
Search engine indexing
Language model
Probabilistic logic
Computer science
Artificial intelligence
Divergence-from-randomness model
Latent variable
Probabilistic latent semantic analysis
Natural language processing

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations