Two-Branch Relational Prototypical Network for Weakly Supervised Temporal Action Localization.

2021 
As a challenging task of high-level video understanding, weakly supervised temporal action localization has attracted more attention recently. With only video-level category labels, this task should identify the background and actions frame by frame, however, it is non-trivial to achieve this, due to the unconstrained background, complex and multi-label actions. With the observation that these difficulties are mainly brought by the large variations within background and actions, we propose to address these challenges from the perspective of modeling variations. Moreover, it is desired to further reduce the variances, so as to cast the problem of background identification as rejecting background and alleviate the contradiction between classification and detection. Accordingly, in this paper, we propose a two-branch relational prototypical network. The first branch, namely action-branch, adopts class-wise prototypes and mainly acts as an auxiliary to introduce prior knowledge about label dependencies. Meanwhile, the second branch, sub-branch, starts with multiple prototypes, namely sub-prototypes, to enable a powerful ability to model variations. As a further benefit, we elaborately design a multi-label clustering loss based on the sub-prototypes to learn compact features under the multi-label setting. Extensive experiments on three datasets demonstrate the effectiveness of the proposed method and superior performance over state-of-the-art methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []