Complementary Siamese Networks for Robust Visual Tracking

2019 
In this paper, we propose the novel complementary Siamese networks (CoSNet) for visual tracking by exploiting complementary global and local representations to learn a matching function. In specific, the proposed CoSNet is two-fold: a global Siamese network (GSNet) and a local Siamese network (LSNet). The GSNet aims to match the target with candidates using holistic representation. By contrast, the LSNet explores partial object representation for matching. Instead of simply decomposing the object into regular patches in LSNet, we propose a novel attentional local part network, which automatically generates salient object parts for local representation and adaptively weights each part according to its importance in matching. In CoSNet, the GSNet and LSNet are jointly trained in an end-to-end manner. By coupling two complementary Siamese networks, our CoSNet learns a robust matching function which can effectively handle various appearance changes in visual tracking. Extensive experiments on a large-scale dataset with 100 sequences show that CoSNet outperforms other state-of-the-art trackers.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    0
    Citations
    NaN
    KQI
    []