Diffusion Models Beat GANs on Image Classification
Soumik MukhopadhyayMatthew GwilliamVatsal AgarwalNamitha PadmanabhanArchana SwaminathanSrinidhi HegdeTianyi ZhouAbhinav Shrivastava
9
Citation
0
Reference
10
Related Paper
Citation Trend
Abstract:
While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which uses a single pre-training stage to address both families of tasks simultaneously. We identify diffusion models as a prime candidate. Diffusion models have risen to prominence as a state-of-the-art method for image generation, denoising, inpainting, super-resolution, manipulation, etc. Such models involve training a U-Net to iteratively predict and remove noise, and the resulting model can synthesize high fidelity, diverse, novel images. The U-Net architecture, as a convolution-based architecture, generates a diverse set of feature representations in the form of intermediate feature maps. We present our findings that these embeddings are useful beyond the noise prediction task, as they contain discriminative information and can also be leveraged for classification. We explore optimal methods for extracting and using these embeddings for classification tasks, demonstrating promising results on the ImageNet classification task. We find that with careful feature selection and pooling, diffusion models outperform comparable generative-discriminative methods such as BigBiGAN for classification tasks. We investigate diffusion models in the transfer learning regime, examining their performance on several fine-grained visual classification datasets. We compare these embeddings to those generated by competing architectures and pre-trainings for classification tasks.Keywords:
Discriminative model
Pooling
Contextual image classification
Feature (linguistics)
Generative model
Feature Learning
Machine learning-based unfolding has enabled unbinned and high-dimensional differential cross section measurements. Two main approaches have emerged in this research area: one based on discriminative models and one based on generative models. The main advantage of discriminative models is that they learn a small correction to a starting simulation while generative models scale better to regions of phase space with little data. We propose to use Schroedinger Bridges and diffusion models to create SBUnfold, an unfolding approach that combines the strengths of both discriminative and generative models. The key feature of SBUnfold is that its generative model maps one set of events into another without having to go through a known probability density as is the case for normalizing flows and standard diffusion models. We show that SBUnfold achieves excellent performance compared to state of the art methods on a synthetic Z+jets dataset.
Discriminative model
Generative model
Feature (linguistics)
Cite
Citations (2)
We present a general framework for dependency parsing of Italian sentences based on a combination of discriminative and generative models. We use a state-of-the-art discriminative model to obtain a k-best list of candidate structures for the test sentences, and use the generative model to compute the probability of each candidate, and select the most probable one. We present the details of the specific generative model we have employed for the EVALITA'09 task. Results show that by using the generative model we gain around 1% in labeled accuracy (around 7% error reduction) over the discriminative model.
Discriminative model
Generative model
Dependency grammar
Cite
Citations (0)
An interesting question surrounding semi-supervised learning for NLP is: should we use discriminative models or generative models? Despite the fact that generative models have been frequently employed in a semi-supervised setting since the early days of the statistical revolution in NLP, we advocate the use of discriminative models. The ability of discriminative models to handle complex, high-dimensional feature spaces and their strong theoretical guarantees have made them a very appealing alternative to their generative counterparts. Perhaps more importantly, discriminative models have been shown to offer competitive performance on a variety of sequential and structured learning tasks in NLP that are traditionally tackled via generative models, such as letter-to-phoneme conversion (Jiampojamarn et al., 2008), semantic role labeling (Toutanova et al., 2005), syntactic parsing (Taskar et al., 2004), language modeling (Roark et al., 2004), and machine translation (Liang et al., 2006). While generative models allow the seamless integration of prior knowledge, discriminative models seem to outperform generative models in a "no prior", agnostic learning setting. See Ng and Jordan (2002) and Toutanova (2006) for insightful comparisons of generative and discriminative models.
Discriminative model
Generative model
Feature (linguistics)
Cite
Citations (1)
This paper presents an adaptive discriminative generative model that generalizes the conventional Fisher Linear Discriminant algorithm and renders a proper probabilistic interpretation. Within the context of object tracking, we aim to find a discriminative generative model that best separates the target from the background. We present a computationally efficient algorithm to constantly update this discriminative model as time progresses. While most tracking algorithms operate on the premise that the object appearance or ambient lighting condition does not significantly change as time progresses, our method adapts a discriminative generative model to reflect appearance variation of the target and background, thereby facilitating the tracking task in ever-changing environments. Numerous experiments show that our method is able to learn a discriminative generative model for tracking target objects undergoing large pose and lighting changes.
Discriminative model
Generative model
Active appearance model
Generative Design
Cite
Citations (81)
Discriminative model
Generative model
Cite
Citations (0)
When labelled training data is plentiful, discriminative techniques are widely used since they give excellent generalization performance. However, for large-scale applications such as object recognition, hand labelling of data is expensive, and there is much interest in semi-supervised techniques based on generative models in which the majority of the training data is unlabelled. Although the generalization performance of generative models can often be improved by 'training them discriminatively', they can then no longer make use of unlabelled data. In an attempt to gain the benefit of both generative and discriminative approaches, heuristic procedure have been proposed [2, 3] which interpolate between these two extremes by taking a convex combination of the generative and discriminative objective functions. In this paper we adopt a new perspective which says that there is only one correct way to train a given model, and that a 'discriminatively trained' generative model is fundamentally a new model [7]. From this viewpoint, generative and discriminative models correspond to specific choices for the prior over parameters. As well as giving a principled interpretation of 'discriminative training', this approach opens door to very general ways of interpolating between generative and discriminative extremes through alternative choices of prior. We illustrate this framework using both synthetic data and a practical example in the domain of multi-class object recognition. Our results show that, when the supply of labelled training data is limited, the optimum performance corresponds to a balance between the purely generative and the purely discriminative.
Discriminative model
Generative model
Cite
Citations (343)
Weakly supervised object localization (WSOL) remains challenging when learning object localization models from image category labels. Conventional methods that discriminatively train activation models ignore representative yet less discriminative object parts. In this study, we propose a generative prompt model (GenPromp), defining the first generative pipeline to localize less discriminative object parts by formulating WSOL as a conditional image denoising procedure. During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings. During inference, enPromp combines the representative embeddings with discriminative embeddings (queried from an off-the-shelf vision-language model) for both representative and discriminative capacity. The combined embeddings are finally used to generate multi-scale high-quality attention maps, which facilitate localizing full object extent. Experiments on CUB-200-2011 and ILSVRC show that GenPromp respectively outperforms the best discriminative models by 5.2% and 5.6% (Top-1 Loc), setting a solid baseline for WSOL with the generative model. Code is available at https://github.com/callsys/GenPromp.
Discriminative model
Generative model
Code (set theory)
Cite
Citations (0)
In this work, we show that text-to-image generative models can be 'inverted' to assess their own text-image understanding capabilities in a completely automated manner. Our method, called SelfEval, uses the generative model to compute the likelihood of real images given text prompts, making the generative model directly applicable to discriminative tasks. Using SelfEval, we repurpose standard datasets created for evaluating multimodal text-image discriminative models to evaluate generative models in a fine-grained manner: assessing their performance on attribute binding, color recognition, counting, shape recognition, spatial understanding. To the best of our knowledge SelfEval is the first automated metric to show a high degree of agreement for measuring text-faithfulness with the gold-standard human evaluations across multiple models and benchmarks. Moreover, SelfEval enables us to evaluate generative models on challenging tasks such as Winoground image-score where they demonstrate competitive performance to discriminative models. We also show severe drawbacks of standard automated metrics such as CLIP-score to measure text faithfulness on benchmarks such as DrawBench, and how SelfEval sidesteps these issues. We hope SelfEval enables easy and reliable automated evaluation for diffusion models.
Discriminative model
Generative model
Cite
Citations (0)
Here we explore a discriminative learning method on underlying generative models for the purpose of discriminating between object categories. Visual recognition algorithms learn models from a set of training examples. Generative models learn their representations by considering data from a single class. Generative models are popular in computer vision for many reasons, including their ability to elegantly incorporate prior knowledge and to handle correspondences between object parts and detected features. However, generative models are often inferior to discriminative models during classification tasks. We study a discriminative approach to learning object categories which maintains the representational power of generative learning, but trains the generative models in a discriminative manner. The discriminatively trained models perform better during classification tasks as a result of selecting discriminative sets of features. We conclude by proposing a multi-class object recognition system which initially trains object classes in a generative manner, identifies subsets of similar classes with high confusion, and finally trains models for these subsets in a discriminative manner to realize gains in classification performance.
Discriminative model
Generative model
Generative Design
Cite
Citations (56)
Weakly supervised object localization (WSOL) remains challenging when learning object localization models from image category labels. Conventional methods that discriminatively train activation models ignore representative yet less discriminative object parts. In this study, we propose a generative prompt model (GenPromp), defining the first generative pipeline to localize less discriminative object parts by formulating WSOL as a conditional image denoising procedure. During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings. During inference, GenPromp combines the representative embeddings with discriminative embeddings (queried from an off-the-shelf vision-language model) for both representative and discriminative capacity. The combined embeddings are finally used to generate multi-scale high-quality attention maps, which facilitate localizing full object extent. Experiments on CUB-200-2011 and ILSVRC show that GenPromp respectively outperforms the best discriminative models by 5.2% and 5.6% (Top-1 Loc), setting a solid baseline for WSOL with the generative model. Code is available at https://github.com/callsys/GenPromp.
Discriminative model
Generative model
Code (set theory)
Cite
Citations (11)