Boundary Detector Encoder and Decoder with Soft Attention for Video Captioning

Tangming Chen,Qike Zhao,Jingkuan Song

Boundary Detector Encoder and Decoder with Soft Attention for Video Captioning

2019

The use of Recurrent Neural Networks and Convolutional Neural Networks for video captioning has received widespread attention, since the deep learning has developed rapidly. Based on classical encoder-decoder approach, we modify the encoding networks and decoding networks to improve the performance of the entire networks. In this paper, we introduce an encoding scheme that can detect the hierarchical structure of the input video. What’s more, we use soft attention mechanism which can learn to automatically select the relevant input frames from the input video to generate the description of the input video. Extensive experiments are conducted on two datasets: the Microsoft Video Description Corpus and the MSR-Video To Text. Three metrics, BLEU@4, METEOR and CIDEr are used to evaluate our approach. Experimental results demonstrate the effectiveness of our approach.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations