An approach for exploring a video via multimodal feature extraction and user interactions

2018 
Exploring the content of a video is typically inefficient due to the linear streamed nature of its media and the lack of interactivity. Video may be seen as a combination of a set of features, the visual track, the audio track and transcription of the spoken words, etc. These features may be viewed as a set of temporally bounded parallel modalities. It is our contention that together these modalities and derived features have the potential to be presented individually or in discrete combination, to allow deeper and effective content exploration within different parts of a video in an interactive manner. A novel system for video exploration by offering video content as an alternative representation is proposed. The proposed system represents the extracted multimodal features as an automatically generated interactive multimedia webpage. This paper also presents a user study conducted to learn its (proposed system) usage patterns. The learned usage patterns may be utilized to build a template driven representation engine that uses the features to offer a multimodal synopsis of video that may lead to efficient exploration of video content.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    4
    Citations
    NaN
    KQI
    []