Sports Video Captioning by Attentive Motion Representation based Hierarchical Recurrent Neural Networks

Mengshi Qi,Yunhong Wang,Annan Li,Jiebo Luo

Sports Video Captioning by Attentive Motion Representation based Hierarchical Recurrent Neural Networks

2018

Sports video captioning is a task of automatically generating a textual description for sports events (e.g. football, basketball or volleyball games). Although previous works have shown promising performance in producing the coarse and general description of a video, it is still quite challenging to caption a sports video with multiple fine-grained player's actions and complex group relationship among players. In this paper, we present a novel hierarchical recurrent neural network (RNN) based framework with an attention mechanism for sports video captioning. A motion representation module is proposed to extract individual pose attribute and group-level trajectory cluster information. Moreover, we introduce a new dataset called Sports Video Captioning Dataset-Volleyball for evaluation. We evaluate our proposed model over two public datasets and our new dataset, and the experimental results demonstrate that our method outperforms the state-of-the-art methods.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations