Nocal-Siam: Refining Visual Features and Response with Advanced Non-local Blocks for Real-time Siamese Tracking.

2021 
Siamese trackers contain two core stages, i.e., learning the features of both target and search inputs at first and then calculating response maps via the cross-correlation operation, which can also be used for regression and classification to construct typical one-shot detection tracking framework. Although they have drawn continuous interest from the visual tracking community due to the proper trade-off between accuracy and speed, both stages are easily sensitive to the distracters in search branch, thereby inducing unreliable response positions. To fill this gap, we advance Siamese trackers with two novel non-local blocks named Nocal-Siam, which leverages the long-range dependency property of the non-local attention in a supervised fashion from two aspects. First, a target-aware non-local block (T-Nocal) is proposed for learning the target-guided feature weights, which serve to refine visual features of both target and search branches, and thus effectively suppress noisy distracters. This block reinforces the interplay between both target and search branches in the first stage. Second, we further develop a location-aware non-local block (L-Nocal) to associate multiple response maps, which prevents them inducing diverse candidate target positions in the future coming frame. Experiments on five popular benchmarks show that Nocal-Siam performs favorably against well-behaved counterparts both in quantity and quality.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    45
    References
    2
    Citations
    NaN
    KQI
    []