Fine-grained Discriminative Localization via Saliency-guided Faster R-CNN
52
Citation
41
Reference
10
Related Paper
Citation Trend
Abstract:
Discriminative localization is essential for fine-grained image classification task, which devotes to recognizing hundreds of subcategories in the same basic-level category. Reflecting on discriminative regions of objects, key differences among different subcategories are subtle and local. Existing methods generally adopt a two-stage learning framework: The first stage is to localize the discriminative regions of objects, and the second is to encode the discriminative features for training classifiers. However, these methods generally have two limitations: (1) Separation of the two-stage learning is time-consuming. (2) Dependence on object and parts annotations for discriminative localization learning leads to heavily labor-consuming labeling. It is highly challenging to address these two important limitations simultaneously. Existing methods only focus on one of them. Therefore, this paper proposes the discriminative localization approach via saliency-guided Faster R-CNN to address the above two limitations at the same time, and our main novelties and advantages are: (1) End-to-end network based on Faster R-CNN is designed to simultaneously localize discriminative regions and encode discriminative features, which accelerates classification speed. (2) Saliency-guided localization learning is proposed to localize the discriminative region automatically, avoiding labor-consuming labeling. Both are jointly employed to simultaneously accelerate classification speed and eliminate dependence on object and parts annotations. Comparing with the state-of-the-art methods on the widely-used CUB-200-2011 dataset, our approach achieves both the best classification accuracy and efficiency.Keywords:
Discriminative model
ENCODE
Currently, most top-performing Weakly supervised Fine-grained Image Classification (WFGIC) schemes tend to pick out discriminative patches. However, those patches usually contain much noise information, which influences the accuracy of the classification. Besides, they rely on a large amount of candidate patches to discover the discriminative ones, thus leading to high computational cost. To address these problems, we propose a novel end-to-end Self-regressive Localization with Discriminative Prior Network (SDN) model, which learns to explore more accurate size of discriminative patches and enables to classify images in real time. Specifically, we design a multi-task discriminative learning network, a self-regressive localization sub-network and a discriminative prior sub-network with the guided loss as well as the consistent loss to simultaneously learn self-regressive coefficients and discriminative prior maps. The self-regressive coefficients can decrease noise information in discriminative patches and the discriminative prior maps through learning discriminative probability values filter thousands of candidate patches to single figure. Extensive experiments demonstrate that the proposed SDN model achieves state-of-the-art both in accuracy and efficiency.
Discriminative model
Contextual image classification
Cite
Citations (3)
When shown the names of two objects, subjects determine which object is larger more slowly as the difference in the sizes of the objects decreases. This might result from variations in the time taken to access sufficient information to perform the task; information which crudely specifies size is accessed first and can be used when the sizes differ greatly; information which specifies size on a more finely graded scale must be accessed when they do not. This hypothesis was tested. Subjects shown the names of three objects, determined which object was intermediate in size. Immediately thereafter the name of another object was shown, the task then being to decide whether the object previously judged intermediate was larger than this object. In this second task reaction times increased with decreasing differences in size between the two objects; this increase was smaller, however, when the sizes of the objects in the first task were similar. The results were predicted from the assumption that when the specification of an object's size in terms of fine discrimination is accessed for comparison in the first task it remains available for use in the second task; thus the time normally required for accessing that information in the second task is reduced. Some implications of the results are discussed.
Cite
Citations (0)
This paper presents a task teaching scheme for a daily life support robot. The features of out task teaching system are 1) the object model is composed of the list of task models which can be applied to the object category, we call it Object Template Model (OTM); 2) a user identifies the target object and recognizes the object situation in the environment, and instructs a robot to execute the picking task by selecting an OTM and a task model described in the OTM according to the object type and situation. The user can instruct a robot with just a few mouse clicks.
Object model
Cite
Citations (0)
The ENCODE (Encyclopedia of DNA Elements) project, started in 2003, is a consortium of 442 scientists from around the world working together to assign a function to the DNA that does not encode genes. ENCODE used 147 different cell types and many different research techniques to achieve their goal. On September 5, 2012, ENCODE released the initial results of their study. The purpose of this review is to summarize a fraction of ENCODE’s results.
ENCODE
Encyclopedia
Cite
Citations (0)
The advancements in generative modeling, particularly the advent of diffusion models, have sparked a fundamental question: how can these models be effectively used for discriminative tasks? In this work, we find that generative models can be great test-time adapters for discriminative models. Our method, Diffusion-TTA, adapts pre-trained discriminative models such as image classifiers, segmenters and depth predictors, to each unlabelled example in the test set using generative feedback from a diffusion model. We achieve this by modulating the conditioning of the diffusion model using the output of the discriminative model. We then maximize the image likelihood objective by backpropagating the gradients to discriminative model's parameters. We show Diffusion-TTA significantly enhances the accuracy of various large-scale pre-trained discriminative models, such as, ImageNet classifiers, CLIP models, image pixel labellers and image depth predictors. Diffusion-TTA outperforms existing test-time adaptation methods, including TTT-MAE and TENT, and particularly shines in online adaptation setups, where the discriminative model is continually adapted to each example in the test set. We provide access to code, results, and visualizations on our website: https://diffusion-tta.github.io/.
Discriminative model
Generative model
Cite
Citations (0)
This study investigated the relation between prevocational preference, as measured by the client's selection of a task object, and the work that followed that choice. After selecting a task object, the clients worked a task previously assessed to be more or less preferred than the one indicated by the object. The results indicated that when the selection represented a task that was less preferred than the one actually worked, choices for that object increased on subsequent trials. Conversely, when the selection represented a task that was more preferred than the task subject actually worked, choices for the object decreased on subsequent trials. The work that followed object choices reinforced or punished subsequent selections. These findings indicated that the clients' object choices were valid indicators of their preference for working different tasks. They were also consistent with Premack's principle that one class of responses may reinforce or punish a different class of responses for the same individual.
Cite
Citations (76)
Discriminative localization is essential for fine-grained image classification task, which devotes to recognizing hundreds of subcategories in the same basic-level category. Reflecting on discriminative regions of objects, key differences among different subcategories are subtle and local. Existing methods generally adopt a two-stage learning framework: The first stage is to localize the discriminative regions of objects, and the second is to encode the discriminative features for training classifiers. However, these methods generally have two limitations: (1) Separation of the two-stage learning is time-consuming. (2) Dependence on object and parts annotations for discriminative localization learning leads to heavily labor-consuming labeling. It is highly challenging to address these two important limitations simultaneously. Existing methods only focus on one of them. Therefore, this paper proposes the discriminative localization approach via saliency-guided Faster R-CNN to address the above two limitations at the same time, and our main novelties and advantages are: (1) End-to-end network based on Faster R-CNN is designed to simultaneously localize discriminative regions and encode discriminative features, which accelerates classification speed. (2) Saliency-guided localization learning is proposed to localize the discriminative region automatically, avoiding labor-consuming labeling. Both are jointly employed to simultaneously accelerate classification speed and eliminate dependence on object and parts annotations. Comparing with the state-of-the-art methods on the widely-used CUB-200-2011 dataset, our approach achieves both the best classification accuracy and efficiency.
Discriminative model
ENCODE
Cite
Citations (52)
A new encode/decode scheme of OCDMA—films encode/decoder is proposed. The principle of films encode/decoder is analyzed. The system structure of films encode/decoder is given. It can also be seen that all optical CDMA encode/decode can be realized by the proposed system of films encode/decoder.At the same time, the OCDMA encode/decoder can be integrated easily.It can also made the access be controlled easily.
ENCODE
Cite
Citations (0)
Discriminative learning of sparse-code based dictionaries tends to be inherently unstable. We show that using a discriminative version of the deviation function to learn such dictionaries leads to a more stable formulation that can handle the reconstruction/discrimination trade-off in a principled manner. Results on Graz02 and UCF Sports datasets validate the proposed formulation.
Discriminative model
Dictionary Learning
Code (set theory)
Cite
Citations (6)
ENCODE
Gene density
Cite
Citations (2)