One-Shot Example Videos Localization Network for Weakly-Supervised Temporal Action Localization

Yushu Liu,Weigang Zhang,Guorong Li,Li Su,Qingming Huang

One-Shot Example Videos Localization Network for Weakly-Supervised Temporal Action Localization

2021

This paper tackles the problem of example-driven weakly-supervised temporal action localization. We propose the One-shot Example Videos Localization Network (OSEVLNet) for precisely localizing the action instances in untrimmed videos with only one trimmed example video. Since the frame-level ground truth is unavailable under weakly-supervised settings, our approach automatically trains a self-attention module with reconstruction and feature discrepancy restriction. Specifically, the reconstruction restriction minimizes the discrepancy between the original input features and the reconstructed features of a Variational AutoEncoder (VAE) module. The feature discrepancy restriction maximizes the distance of weighted features between highly-responsive regions and slightly-responsive regions. Our approach achieves comparable or better results on THUMOS’14 dataset than other weakly-supervised methods while it is trained with much less videos. Moreover, our approach is especially suitable for the expansion of newly emerging action categories to meet the requirements of different occasions.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations