Video Relation Detection with Spatio-Temporal Graph

Xufeng Qian,Yueting Zhuang,Yimeng Li,Shaoning Xiao,Shiliang Pu,Jun Xiao

Video Relation Detection with Spatio-Temporal Graph

2019

What we perceive from visual content are not only collections of objects but the interactions between them. Visual relations, denoted by the triplet , could convey a wealth of information for visual understanding. Different from static images and because of the additional temporal channel, dynamic relations in videos are often correlated in both spatial and temporal dimensions, which make the relation detection in videos a more complex and challenging task. In this paper, we abstract videos into fully-connected spatial-temporal graphs. We pass message and conduct reasoning in these 3D graphs with a novel VidVRD model using graph convolution network. Our model can take advantage of spatial-temporal contextual cues to make better predictions on objects as well as their dynamic relationships. Furthermore, an online association method with a siamese network is proposed for accurate relation instances association. By combining our model (VRD-GCN) and the proposed association method, our framework for video relation detection achieves the best performance in the latest benchmarks. We validate our approach on benchmark ImageNet-VidVRD dataset. The experimental results show that our framework outperforms the state-of-the-art by a large margin and a series of ablation studies demonstrate our method's effectiveness.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations