Hyper-Siamese network for robust visual tracking

2019 
Matching-based tracking has drawn increasingly interest in the object tracking field, among which SiamFC tracker shows great potentials in achieving high accuracy and efficiency. However, the feature representations of target in SiamFC are extracted by the last layer of convolutional neural networks and mainly capture semantic information, which makes SiamFC drift easily in presence of similar distractors. Considering that the different layers of convolutional neural networks characterize the target from different perspectives and the lower-level feature maps of SiamFC are computed beforehand, in this paper we design a skip-layer connection network named Hyper-Siamese to aggregate the hierarchical feature maps of SiamFC and constitute the hyper-feature representations of the target. Hyper-Siamese network is trained end-to-end offline on the ILSVRC2015 dataset and later utilized for online tracking. By visualizing the outputs of different layers and comparing the tracking results under various concatenation mode of layers, we prove that different convolutional layers are all useful for object tracking. Experimental results on the OTB100 and TC128 benchmarks demonstrate that our proposed algorithm performs favorably against not only the foundation tracker SiamFC (2.9% gain in OS rate and 2.8% gain in DP rate on OTB100) but also many state-of-the-art trackers. Meanwhile, our proposed tracker can achieve a real-time tracking speed (25 fps).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    5
    Citations
    NaN
    KQI
    []