Unsupervised Disentanglement Learning by intervention

Citation

Reference

Related Paper

Abstract:

Recently there has been an increased interest in unsupervised learning of disentangled representations on the data generated from variation factors. Existing works rely on the assumption that the generative factors are independent despite this assumption is often violated in real-world scenarios. In this paper, we focus on the unsupervised learning of disentanglement in a general setting which the generative factors may be correlated. We propose an intervention-based framework to tackle this problem. In particular, first we apply a random intervention operation on a selected feature of the learnt image representation; then we propose a novel metric to measure the disentanglement by a downstream image translation task and prove it is consistent with existing ground-truth-required metrics experimentally; finally we design an end-to-end model to learn the disentangled representations with the self-supervision information from the downstream translation task. We evaluate our method on benchmark datasets quantitatively and give qualitative comparisons on a real-world dataset. Experiments show that our algorithm outperforms baselines on benchmark datasets when faced with correlated data and can disentangle semantic factors compared to baselines on real-world dataset.

Keywords:

Benchmark (surveying)

Generative model

Representation

Feature Learning

Ground truth

Feature (linguistics)

Topics:

Digital Media Forensic Detection

Generative Adversarial Networks and Image Synthesis

Adversarial Robustness in Machine Learning

Source

Cite

Continual Learning for Text Classification with Information Disentanglement Based Regularization

arXiv (Cornell University) (2021)

Yufan Huang Yanzhe Zhang Jiaao Chen Xuezhi Wang Diyi Yang

Continual learning has become increasingly important as it enables NLP models to constantly learn and gain knowledge over time. Previous continual learning methods are mainly designed to preserve knowledge from previous tasks, without much emphasis on how to well generalize models to new tasks. In this work, we propose an information disentanglement based regularization method for continual learning on text classification. Our proposed method first disentangles text hidden spaces into representations that are generic to all tasks and representations specific to each individual task, and further regularizes these representations differently to better constrain the knowledge required to generalize. We also introduce two simple auxiliary tasks: next sentence prediction and task-id prediction, for learning better generic and specific representation spaces. Experiments conducted on large-scale benchmarks demonstrate the effectiveness of our method in continual text classification tasks with various sequences and lengths over state-of-the-art baselines. We have publicly released our code at https://github.com/GT-SALT/IDBR.

Regularization

Code (set theory)

Representation

Feature Learning

10.48550/arxiv.2104.05489

Cite

Citations (0)

Odd-One-Out Representation Learning

arXiv (Cornell University) (2020)

Salman Mohammadi Anders Kirk Uhrenholt Bjørn Sand Jensen

The effective application of representation learning to real-world problems requires both techniques for learning useful representations, and also robust ways to evaluate properties of representations. Recent work in disentangled representation learning has shown that unsupervised representation learning approaches rely on fully supervised disentanglement metrics, which assume access to labels for ground-truth factors of variation. In many real-world cases ground-truth factors are expensive to collect, or difficult to model, such as for perception. Here we empirically show that a weakly-supervised downstream task based on odd-one-out observations is suitable for model selection by observing high correlation on a difficult downstream abstract visual reasoning task. We also show that a bespoke metric-learning VAE model which performs highly on this task also out-performs other standard unsupervised and a weakly-supervised disentanglement model across several metrics.

Bespoke

Representation

Feature Learning

Ground truth

Supervised Learning

Variation (astronomy)

10.48550/arxiv.2012.07966

Cite

Citations (1)

Induction Networks for Few-Shot Text Classification

arXiv (Cornell University) (2019)

Ruiying Geng Binhua Li Yongbin Li Xiaodan Zhu Ping Jian

Text classification tends to struggle when data is deficient or when it needs to adapt to unseen classes. In such challenging scenarios, recent studies have used meta-learning to simulate the few-shot task, in which new queries are compared to a small support set at the sample-wise level. However, this sample-wise comparison may be severely disturbed by the various expressions in the same class. Therefore, we should be able to learn a general representation of each class in the support set and then compare it to new queries. In this paper, we propose a novel Induction Network to learn such a generalized class-wise representation, by innovatively leveraging the dynamic routing algorithm in meta-learning. In this way, we find the model is able to induce and generalize better. We evaluate the proposed model on a well-studied sentiment classification dataset (English) and a real-world dialogue intent classification dataset (Chinese). Experiment results show that on both datasets, the proposed model significantly outperforms the existing state-of-the-art approaches, proving the effectiveness of class-wise generalization in few-shot text classification.

Representation

Sample (material)

Training set

10.48550/arxiv.1902.10482

Cite

Citations (16)

Few-Shot Text Classification with Induction Network.

arXiv (Cornell University) (2019)

Ruiying Geng Binhua Li Yongbin Li Yuxiao Ye Ping Jian

Text classification tends to struggle when data is deficient or when it needs to adapt to unseen classes. In such challenging scenarios, recent studies often use meta learning to simulate the few-shot task, in which new queries are compared to a small support set on a sample-wise level. However, this sample-wise comparison may be severely disturbed by the various expressions in the same class. Therefore, we should be able to learn a general representation of each class in the support set and then compare it to new queries. In this paper, we propose a novel Induction Network to learn such generalized class-wise representations, innovatively combining the dynamic routing algorithm with the typical meta learning framework. In this way, our model is able to induce from particularity to university, which is a more human-like learning approach. We evaluate our model on a well-studied sentiment classification dataset (English) and a real-world dialogue intent classification dataset (Chinese). Experiment results show that, on both datasets, our model significantly outperforms existing state-of-the-art models and improves the average accuracy by more than 3%, which proves the effectiveness of class-wise generalization in few-shot text classification.

Representation

Sample (material)

Source

Cite

Citations (20)

CoCon: Cooperative-Contrastive Learning

arXiv (Cornell University) (2021)

Nishant Rai Ehsan Adeli Kuan-Hui Lee Adrien Gaidon Juan Carlos Niebles

Labeling videos at scale is impractical. Consequently, self-supervised visual representation learning is key for efficient video analysis. Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge. However, when applied to real-world videos, contrastive learning may unknowingly lead to the separation of instances that contain semantically similar events. In our work, we introduce a cooperative variant of contrastive learning to utilize complementary information across views and address this issue. We use data-driven sampling to leverage implicit relationships between multiple input video views, whether observed (e.g. RGB) or inferred (e.g. flow, segmentation masks, poses). We are one of the firsts to explore exploiting inter-instance relationships to drive learning. We experimentally evaluate our representations on the downstream task of action recognition. Our method achieves competitive performance on standard benchmarks (UCF101, HMDB51, Kinetics400). Furthermore, qualitative experiments illustrate that our models can capture higher-order class relationships.

Leverage (statistics)

Feature Learning

Representation

Source

Cite

Citations (0)

Self-supervised Learning from a Multi-view Perspective

arXiv (Cornell University) (2020)

Yao-Hung Hubert Tsai Yue Wu Ruslan Salakhutdinov Louis‐Philippe Morency

As a subset of unsupervised representation learning, self-supervised representation learning adopts self-defined signals as supervision and uses the learned representation for downstream tasks, such as object detection and image captioning. Many proposed approaches for self-supervised learning follow naturally a multi-view perspective, where the input (e.g., original images) and the self-supervised signals (e.g., augmented images) can be seen as two redundant views of the data. Building from this multi-view perspective, this paper provides an information-theoretical framework to better understand the properties that encourage successful self-supervised learning. Specifically, we demonstrate that self-supervised learned representations can extract task-relevant information and discard task-irrelevant information. Our theoretical framework paves the way to a larger space of self-supervised learning objective design. In particular, we propose a composite objective that bridges the gap between prior contrastive and predictive learning objectives, and introduce an additional objective term to discard task-irrelevant information. To verify our analysis, we conduct controlled experiments to evaluate the impact of the composite objectives. We also explore our framework's empirical generalization beyond the multi-view perspective, where the cross-view redundancy may not be clearly observed.

Closed captioning

Representation

Supervised Learning

Feature Learning

Source

Cite

Citations (4)

DisCont: Self-Supervised Visual Attribute Disentanglement Using Context Vectors

Lecture notes in computer science (2020)

Sarthak Bhagat Vishaal Udandarao Shagun Uppal Saket Anand

Interpretability

Benchmark (surveying)

Code (set theory)

Feature (linguistics)

Supervised Learning

10.1007/978-3-030-65414-6_38

Cite

Citations (3)

Taxonomy of multimodal self-supervised representation learning.

arXiv (Cornell University) (2020)

Alex Fedorov Tristan Sylvain Margaux Luck Lei Wu Thomas P. DeRamus

Sensory input from multiple sources is crucial for robust and coherent human perception. Different sources contribute complementary explanatory factors and get combined based on factors they share. This system motivated the design of powerful unsupervised representation-learning algorithms. In this paper, we unify recent work on multimodal self-supervised learning under a single framework. Observing that most self-supervised methods optimize similarity metrics between a set of model components, we propose a taxonomy of all reasonable ways to organize this process. We empirically show on two versions of multimodal MNIST and a multimodal brain imaging dataset that (1) multimodal contrastive learning has significant benefits over its unimodal counterpart, (2) the specific composition of multiple contrastive objectives is critical to performance on a downstream task, (3) maximization of the similarity between representations has a regularizing effect on a neural network, which sometimes can lead to reduced downstream performance but still can reveal multimodal relations. Consequently, we outperform previous unsupervised encoder-decoder methods based on CCA or variational mixtures MMVAE on various datasets on linear evaluation protocol.

MNIST database

Feature Learning

Representation

Source

Cite

Citations (3)

Diverse Few-Shot Text Classification with Multiple Metrics

arXiv (Cornell University) (2018)

Mo Yu Xiaoxiao Guo Jinfeng Yi Shiyu Chang Saloni Potdar

We study few-shot learning in natural language domains. Compared to many existing works that apply either metric-based or optimization-based meta-learning to image domain with low inter-task variance, we consider a more realistic setting, where tasks are diverse. However, it imposes tremendous difficulties to existing state-of-the-art metric-based algorithms since a single metric is insufficient to capture complex task variations in natural language domain. To alleviate the problem, we propose an adaptive metric learning approach that automatically determines the best weighted combination from a set of metrics obtained from meta-training tasks for a newly seen few-shot task. Extensive quantitative evaluations on real-world sentiment analysis and dialog intent classification datasets demonstrate that the proposed method performs favorably against state-of-the-art few shot learning algorithms in terms of predictive accuracy. We make our code and data available for further study.

Code (set theory)

10.48550/arxiv.1805.07513

Cite

Citations (14)

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations

arXiv (Cornell University) (2021)

Wouter Van Gansbeke Simon Vandenhende Stamatios Georgoulis Luc Van Gool

Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection. However, current methods are still primarily applied to curated datasets like ImageNet. In this paper, we first study how biases in the dataset affect existing methods. Our results show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets. Second, given the generality of the approach, we try to realize further gains with minor modifications. We show that learning additional invariances -- through the use of multi-scale cropping, stronger augmentations and nearest neighbors -- improves the representations. Finally, we observe that MoCo learns spatially structured representations when trained with a multi-crop strategy. The representations can be used for semantic segment retrieval and video instance segmentation without finetuning. Moreover, the results are on par with specialized models. We hope this work will serve as a useful study for other researchers. The code and models will be available at this https URL.

Generality

Code (set theory)

Source

Cite

Citations (2)