Learning Siamese Network with Top-Down Modulation for Visual Tracking

2018 
The performance of visual object tracking depends largely on the target appearance model. Benefited from the success of CNN in feature extraction, recent studies have paid much attention to CNN representation learning and feature fusion model. However, the existing feature fusion models ignore the relation between the features of different layers. In this paper, we propose a deep feature fusion model based on the siamese network by considering the connection between feature maps of CNN. To tackle the limitation of different feature map sizes in CNN, we propose to fuse different resolution feature maps by introducing de-convolutional layers in the offline training stage. Specifically, a top-down modulation is adopted for feature fusion. In the tracking stage, a simple matching operation between the fused feature of the examplar and search region is conducted with the learned model, which can maintain the real-time tracking speed. Experimental results show that, the proposed method obtains favorable tracking accuracy against the state-of-the-art trackers with a real-time tracking speed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    2
    Citations
    NaN
    KQI
    []