Action Coherence Network for Weakly Supervised Temporal Action Localization

2019 
Most prominent temporal action localization methods are of the fully-supervised type, which rely heavily on frame-level labels, which could be prohibitively expensive to annotate. Thanks to recent developments on the Weakly-supervised Temporal Action Localization (W-TAL), this alternative paradigm requires only video-level labels in training, alleviating such annotation efforts. Specifically, we present Action Coherence Network (ACN) for W-TAL, which features a new coherence loss that better supervises action boundary learning and facilitate proposal regression. In addition, a purpose-built fusion module is proposed for localization inference based on features extracted by two streams of convolutional neural network. Overall, the proposed ACN achieves state-of-the-art W-TAL performance on two challenging datasets (THU-MOS14 and ActivityNet1.2, particularly ACN attains mAP of 24.2% on THUMOS14 under IoU threshold 0.5), which is approaching some recent fully-supervised TAL methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    8
    Citations
    NaN
    KQI
    []