Sports Video Captioning via Attentive Motion Representation and Group Relationship Modeling

2019 
Sports video captioning refers to the task of automatically generating a textual description for sports events (e.g., football, basketball or volleyball games). Although a great deal of previous work has shown promising performance in producing a coarse and general description of a video but lack of professional sports knowledge, it is still quite challenging to caption a sports video with multiple fine-grained player’s actions and complex group relationship between players. In this study, we present a novel hierarchical recurrent neural network based framework with an attention mechanism for sports video captioning, in which a motion representation module is proposed to capture individual pose attribute and dynamical trajectory cluster information with extra professional sports knowledge, and a group relationship module is employed to design a scene graph for modeling players’ interaction by a gated graph convolutional network. Moreover, we introduce a new dataset called Sports Video Captioning Dataset-Volleyball for evaluation. The proposed model is evaluated on three widely-adopted public datasets and our collected new dataset, on which the effectiveness of our method is well demonstrated.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    80
    References
    15
    Citations
    NaN
    KQI
    []