Beyond caption to narrative: Video captioning with multiple sentences

Andrew Shin,Katsunori Ohnishi,Tatsuya Harada

Beyond caption to narrative: Video captioning with multiple sentences

2016

Andrew Shin
Katsunori Ohnishi
Tatsuya Harada

Recent advances in image captioning task have led to increasing interests in video captioning task. However, most works on video captioning are focused on generating single input of aggregated features, which hardly deviates from image captioning process and does not fully take advantage of dynamic contents present in videos. We attempt to generate video captions that convey richer contents by temporally segmenting the video with action localization, generating multiple captions from multiple frames, and connecting them with natural language processing techniques, in order to generate a story-like caption. We show that our proposed method can generate captions that are richer in contents and can compete with state-of-the-art method without explicitly using videolevel features as input.

Keywords:

Speech recognition
Computer vision
Recurrent neural network
Artificial intelligence
Narrative
Closed captioning
Feature extraction
Image segmentation
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations