Cascaded Generative and Discriminative Learning for Visual Tracking
2
Citation
20
Reference
10
Related Paper
Citation Trend
Keywords:
Discriminative model
BitTorrent tracker
Tracking (education)
Active appearance model
Generative model
Discriminative model
Generative model
Cite
Citations (0)
Visual object tracking is a challenging task in computer vision applications. The basic statistical appearance modeling techniques are discriminative and generative. In both cases, online learning is very essential to nullify the error due to large pose changes, illumination variations and appearance changes of the tracking framework. This paper briefly introduces the challenges and applications of visual tracking and focuses on discussing the state-of-the-art online-learning based tracking methods by category. In this paper, the existing statistical schemes for tracking-by-detection are reviewed according to their appearance model creation mechanism; generative and discriminative.
Discriminative model
Active appearance model
Generative model
Tracking (education)
Statistical Learning
Cite
Citations (0)
Recently, great progesses have been made in using discriminative classifiers in object tracking. More specifically, correlation filters (CFs) for visual tracking have been attractive due to t heir competitive performances on both accuracy and robustness. In this paper, the latest and representative approaches of CF based trackers are presented in detail. In addition, trackers used deep convolutional features are introduced and several famous tracking methods which fine-tune the pretrained deep network are presented. To evaluate the performances of different trackers, a detailed introduction of the evaluation methodology and the datasets is described, and all introduced trackers are compared based on the mentioned datasets. Finally, several promising directions as the conclusions are drawn in this paper.
BitTorrent tracker
Discriminative model
Robustness
Tracking (education)
Cite
Citations (0)
Discriminative model
BitTorrent tracker
Cite
Citations (2)
Maintaining high efficiency and high precision are two fundamental challenges in UAV tracking due to the constraints of computing resources, battery capacity, and UAV maximum load. Discriminative correlation filters (DCF)-based trackers can yield high efficiency on a single CPU but with inferior precision. Lightweight Deep learning (DL)-based trackers can achieve a good balance between efficiency and precision but performance gains are limited by the compression rate. High compression rate often leads to poor discriminative representations. To this end, this paper aims to enhance the discriminative power of feature representations from a new feature-learning perspective. Specifically, we attempt to learn more disciminative representations with contrastive instances for UAV tracking in a simple yet effective manner, which not only requires no manual annotations but also allows for developing and deploying a lightweight model. We are the first to explore contrastive learning for UAV tracking. Extensive experiments on four UAV benchmarks, including UAV123@10fps, DTB70, UAVDT and VisDrone2018, show that the proposed DRCI tracker significantly outperforms state-of-the-art UAV tracking methods.
Discriminative model
BitTorrent tracker
Tracking (education)
Feature (linguistics)
Cite
Citations (0)
Discriminative Correlation Filters (DCFs)-based approaches have recently achieved competitive performance in visual tracking. However, such conventional DCF-based trackers often lack the discriminative ability due to the shallow architecture. As a result, they can hardly tackle drastic appearance variations and easily drift when the target suffers heavy occlusions. To address this issue, a novel densely connected DCFs framework is proposed for visual tracking. We incorporate multiple nested DCFs into the deep learning architecture, and then train the compact network with the data-specific target. Specifically, feature maps and interim response maps are shared and reused throughout the whole network. By doing so, the implicit information carried out by each DCF is fully exploited to enhance the model representation ability during the tracking process. Moreover, a multiscale estimation scheme is developed to account for scale variations. Experimental results on the benchmarks demonstrate that the proposed approach achieves outstanding performance compared to the existing state-of-the-art trackers.
Discriminative model
BitTorrent tracker
Tracking (education)
Representation
Feature (linguistics)
Cite
Citations (5)
Discriminative model
BitTorrent tracker
Tracking (education)
Active appearance model
Generative model
Cite
Citations (2)
Discriminative model
BitTorrent tracker
Feature (linguistics)
Tracking (education)
Cite
Citations (8)
Maintaining high efficiency and high precision are two fundamental challenges in UAV tracking due to the constraints of computing resources, battery capacity, and UAV maximum load. Discriminative correlation filters (DCF)-based trackers can yield high efficiency on a single CPU but with inferior precision. Lightweight Deep learning (DL)-based trackers can achieve a good balance between efficiency and precision but performance gains are limited by the compression rate. High compression rate often leads to poor discriminative representations. To this end, this paper aims to enhance the discriminative power of feature representations from a new feature-learning perspective. Specifically, we attempt to learn more disciminative representations with contrastive instances for UAV tracking in a simple yet effective manner, which not only requires no manual annotations but also allows for developing and deploying a lightweight model. We are the first to explore contrastive learning for UAV tracking. Extensive experiments on four UAV benchmarks, including UAV123@10fps, DTB70, UAVDT and VisDrone2018, show that the proposed DRCI tracker significantly outperforms state-of-the-art UAV tracking methods.
Discriminative model
BitTorrent tracker
Tracking (education)
Feature (linguistics)
Cite
Citations (5)
Weakly supervised object localization (WSOL) remains challenging when learning object localization models from image category labels. Conventional methods that discriminatively train activation models ignore representative yet less discriminative object parts. In this study, we propose a generative prompt model (GenPromp), defining the first generative pipeline to localize less discriminative object parts by formulating WSOL as a conditional image denoising procedure. During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings. During inference, GenPromp combines the representative embeddings with discriminative embeddings (queried from an off-the-shelf vision-language model) for both representative and discriminative capacity. The combined embeddings are finally used to generate multi-scale high-quality attention maps, which facilitate localizing full object extent. Experiments on CUB-200-2011 and ILSVRC show that GenPromp respectively outperforms the best discriminative models by 5.2% and 5.6% (Top-1 Loc), setting a solid baseline for WSOL with the generative model. Code is available at https://github.com/callsys/GenPromp.
Discriminative model
Generative model
Code (set theory)
Cite
Citations (11)