Multi-Task Hierarchical Feature Learning for Real-Time Visual Tracking

2019 
Recently, the tracking community leads a fashion of end-to-end feature learning using convolutional neural networks (CNNs) for visual object tracking. Traditional trackers extract feature maps from the last convolutional layer of CNNs for feature representation. This single-layer representation ignores target information captured in the earlier convolutional layers. In this paper, we propose a novel hierarchical feature learning framework, which captures both high-level semantics and low-level spatial details using multi-task learning. Particularly, feature maps extracted from both the shallow layer and the deep layer are input into a correlation filter layer to encode fine-grained geometric cues and coarse-grained semantic cues, respectively. Our network performs these two feature learning tasks with a multi-task learning strategy. We conduct extensive experiments on three popular tracking datasets, including OTB, UAV123, and VOT2016. Experimental results show that our method achieves remarkable performance improvement while running in real time.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    5
    Citations
    NaN
    KQI
    []