DBAM: Dense Boundary and Actionness Map for Action Localization in Videos via Sentence Query.

2021 
Action localization in videos via sentence query remains a very challenging problem because of the semantic misalignment and the structural misalignment. With the observation that activities should be localized with both the local keywords of query sentence and the global information of whole video, we propose a novel method named Dense Boundary and Actionness Map (DBAM). This method trains a self-attention model to evaluate the importance of each word in the query sentence. Then it constructs a two-dimensional visual feature map for each candidate moment after video encoding. The visual feature map is cross-modal concatenated with the semantic feature and then DBAM directly performs convolution over the feature map to predict two-dimensional actionness map, starting map and ending map for candidate moments. The three maps are fused to generate proposals. We evaluate DBAM on the two challenging public benchmarks Charades-STA and TACoS and it outperforms the state-of-the-art by a large margin.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    0
    Citations
    NaN
    KQI
    []