Multi attention module for visual tracking

2019 
Abstract We propose a new visual tracking algorithm leveraging multi-level visual attention to take full use of the information during tracking. Visual attention has been widely applied in many visual tasks, such as image captioning and question answering. However, most existing attention models only focus on one or two aspects, ignoring the other useful information in visual tracking. Here, we think there are four main attentional aspects in the tracking task and propose a unified network to leverage multi-level visual attention, which includes layer-wise attention, temporal attention, spatial attention and channel-wise attention. Considering that deep features of different levels may be suitable for different scenarios, we propose to train an attention network in the off-line stage to facilitate feature selection in online tracking. To better exploit the temporal consistency assumption of visual tracking, we implement the attention network with long short term memory (LSTM) units, which are capable of capturing the historical context information to perform more reliable inference at the current time step. Different from the image classification task, background clutter is more complicated in the tracking task. Thus, we purify the features by spatial attention and channel-wise attention to effectively suppress the background noise and highlight the target region. In addition, we also enforce deep feature sharing across target candidates using Region of Interest pooling, allowing the features of all candidates to be extracted in only one forward pass of the DNN. To further improve tracking accuracy, a promoting strategy for trackers with detection results of a generic object detector is proposed, reducing the risk of tracking drifts. The proposed tracking algorithm compares favorably against state-of-the-art methods on three popular benchmark datasets. Extensive experimental evaluations demonstrate the effectiveness of the proposed techniques.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    67
    References
    42
    Citations
    NaN
    KQI
    []