Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction

arXiv (Cornell University) (2021)

Mehdi Azabou Mohammad Gheshlaghi Azar Ran Liu Chi-Heng Lin Erik C. Johnson Kiran Bhaskaran‐Nair Max Dabagia Bernardo Avila-Pires Lindsey Kitchell Keith B. Hengen William Gray-Roncal Michal Vaľko Eva L. Dyer

Citation

Reference

Related Paper

Citation Trend

Abstract:

State-of-the-art methods for self-supervised learning (SSL) build representations by maximizing the similarity between different transformed "views" of a sample. Without sufficient diversity in the transformations used to create views, however, it can be difficult to overcome nuisance variables in the data and build rich representations. This motivates the use of the dataset itself to find similar, yet distinct, samples to serve as views for one another. In this paper, we introduce Mine Your Own vieW (MYOW), a new approach for self-supervised learning that looks within the dataset to define diverse targets for prediction. The idea behind our approach is to actively mine views, finding samples that are neighbors in the representation space of the network, and then predict, from one sample's latent representation, the representation of a nearby sample. After showing the promise of MYOW on benchmarks used in computer vision, we highlight the power of this idea in a novel application in neuroscience where SSL has yet to be applied. When tested on multi-unit neural recordings, we find that MYOW outperforms other self-supervised approaches in all examples (in some cases by more than 10%), and often surpasses the supervised baseline. With MYOW, we show that it is possible to harness the diversity of the data to build rich views and leverage self-supervision in new domains where augmentations are limited or unknown.

Keywords:

Leverage (statistics)

Sample (material)

Representation

External Data Representation

Feature Learning

Similarity (geometry)

Supervised Learning

Topics:

Domain Adaptation and Few-Shot Learning

AI in cancer detection

Machine Learning and Data Classification

10.48550/arxiv.2102.10106

Cite

PDF

Deep Linear and Nonlinear Data Representation for Semi-supervised Learning

Authorea (Authorea) (2022)

Fadi Dornaika

Data representation has received much attention in the fields of machine learning and pattern recognition. It is becoming an indispensable tool for many learning tasks. It can be useful for all learning paradigms: unsupervised, semi-supervised, and supervised. In this paper, we present a graph-based, deep and flexible data representation method using feature propagation as an internal filtering step. The presented framework ensures several desired features such as a graph-based regularization, a flexible projection model, a graph-based feature aggregation, and a deep learning architecture. The model can be learned layer by layer. In each layer, the nonlinear data representation and the unknown linear model are jointly estimated with a closed form solution. We evaluate the proposed method on semi-supervised classification tasks using six public image datasets. These experiments demonstrate the effectiveness of the presented scheme, which compares favorably to many competing semi-supervised approaches.

Feature Learning

Representation

Regularization

Supervised Learning

Feature (linguistics)

External Data Representation

Labeled data

10.22541/au.165061985.59481306/v1

Cite

Citations (0)

Experimental Comparisons of Semi-Supervised Dimensional Reduction Methods

Journal of Software (2011)

Shiguo Chen Daoqiang Zhang

半监督学习是近年来机器学习领域中的研究热点之一,已从最初的半监督分类和半监督聚类拓展到半监督回归和半监督降维等领域.目前,有关半监督分类、聚类和回归等方面的工作已经有了很好的综述,如Zhu 的半监督学习文献综述.降维一直是机器学习和模式识别等相关领域的重要研究课题,近年来出现了很多将半监督思想用于降维,即半监督降维方面的工作.有鉴于此,试图对目前已有的一些半监督降维方法进行综述,然后在大量的标准数据集上对这些方法的性能进行实验比较,并据此得出了一些经验性的启示.;Semi-Supervised learning is one of the hottest research topics in the technological community, which has been developed from the original semi-supervised classification and semi-supervised clustering to the semi-supervised regression and semi-supervised dimensionality reduction, etc. At present, there have been several excellent surveys on semi-supervised classification: Semi-Supervised clustering and semi-supervised regression, e.g. Zhu’s semi-supervised learning literature survey. Dimensionality reduction is one of the key issues in machine learning, pattern recognition, and other related fields. Recently, a lot of research has been done to integrate the idea of semi-supervised learning into dimensionality reduction, i.e. semi-supervised dimensionality reduction. In this paper, the current semi-supervised dimensionality reduction methods are reviewed, and their performances are evaluated through extensive experiments on a large number of benchmark datasets, from which some empirical insights can be obtained.

Supervised Learning

Benchmark (surveying)

10.3724/sp.j.1001.2011.03928

Cite

Citations (8)

Experimental Comparisons of Semi-Supervised Dimensional Reduction Methods

Shiguo Chen Zhang Dao-qiang

Semi-Supervised learning is one of the hottest research topics in the technological community, which has been developed from the original semi-supervised classification and semi-supervised clustering to the semi-supervised regression and semi-supervised dimensionality reduction, etc. At present, there have been several excellent surveys on semi-supervised classification: Semi-Supervised clustering and semi-supervised regression, e.g. Zhu's semi-supervised learning literature survey. Dimensionality reduction is one of the key issues in machine learning, pattern recognition, and other related fields. Recently, a lot of research has been done to integrate the idea of semi-supervised learning into dimensionality reduction, i.e. semi-supervised dimensionality reduction. In this paper, the current semi-supervised dimensionality reduction methods are reviewed, and their performances are evaluated through extensive experiments on a large number of benchmark datasets, from which some empirical insights can be obtained.

Supervised Learning

Benchmark (surveying)

Source

Cite

Citations (1)

A risk degree-based safe semi-supervised learning algorithm

International Journal of Machine Learning and Cybernetics (2015)

Haitao Gan Zhizeng Luo Ming Meng Yuliang Ma Qingshan She

Supervised Learning

Data cleansing

10.1007/s13042-015-0416-8

Cite

Citations (14)

Studies on Semi-supervised and its Application in Data Mining

Computer Knowledge and Technology (2010)

Huan Li

Supervised Learning

Online machine learning

Source

Cite

Citations (0)

Weakly Supervised Learning: What Could It Do and What Could Not?

Proceedings of the Annual Meeting of the Cognitive Science Society (2010)

Jinhui Yuan Bo Zhang

Weakly Supervised Learning: What Could It Do and What Could Not? Jinhui Yuan Tsinghua University Bo Zhang Tsinghua University Abstract: Weakly supervised learning is not only a typical way of human concept learning, but also has wide real-world applications. Of particular interest to this paper is the theoretical aspect of weakly supervised learning: (a) Could weakly supervised learning learn the target concept the same as that of fully supervised learning? (b) If yes, under what conditions it will and how to achieve it? In other words, this paper will investigate what weakly supervised learning could do and what could not. The basic idea is, weakly supervised learning could be transformed into an equivalent supervised learning problem, in which way, it could be understood with the tools of supervised learning. The major results of the paper include: (a) the hardness of weakly supervised learning depends on the properties of training data and the adopted feature representation; (b) though there is no theoretical guarantee for a unique identification of the relevant variables, incorporating minimum description length principle may help infer target concept; (c) weakly supervised learning could be solved by EM-style algorithm, which is not a novel idea, however, the theoretical analysis suggests that the E-step and M-step should adopt feature representations with distinct properties rather than using the same feature.

Supervised Learning

Feature (linguistics)

Feature Learning

Representation

Identification

Instance-based learning

Source

Cite

Citations (0)

Recommender system designed using an ensemble approach to semi-supervised learning

2022 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES) (2022)

Nisha Sharma Mala Dutta

In traditional machine learning approaches only the labeled data are used for training the model but labeled data isnot easily available and also the manual labeling of data requires a considerable amount of time, effort and expense. This is the reason why most of the research remains untouched and this problem is addressed by a new technique called semi-supervised learning. Semi-supervised learning limits the use of labeled data by efficiently using an unlabeled data, which is relatively easily available as compared to labeled data. In this paper, our idea is to use the semi-supervised learning technique in the field of regression. We have proposed a model to predict the label for unlabeled data with few labeled data using an ensemble approach to semi-supervised learning. Furthermore, we have also made a comparison of the result obtained when we use a model with and without semi supervised learning and have also shown how semi supervised learning increases the predictive performance

Supervised Learning

Ensemble Learning

Labeled data

Online machine learning

10.1109/icses55317.2022.9914228

Cite

Citations (1)

Empowering Imbalanced Data in Supervised Learning: A Semi-supervised Learning Approach

Lecture notes in computer science (2014)

Bassam Almogahed Ioannis A. Kakadiaris

Supervised Learning

Labeled data

Training set

10.1007/978-3-319-11179-7_66

Cite

Citations (4)

A safe semi-supervised kernel minimum squared error algorithm

Haitao Gan Ming Meng Yuliang Ma Yunyuan Gao

Semi-supervised learning has received much attention in machine learning field over the past decades and a number of algorithms are proposed to improve the performance by exploiting unlabeled data. However, unlabeled data may hurt performance of semi-supervised learning in some cases. It is instinctively expected to design a reasonable strategy to safety exploit unlabeled data. To address the problem, we introduce a safe semi-supervised learning by analyzing the different characteristics of unlabeled data in supervised and semi-supervised learning. Our intuition is that unlabeled data may be risky in semi-supervised setting and the risk degree are different. Hence, we assign different weights to unlabeled data. The unlabeled data with high risk should be exploited by supervised learning and the other should be used for semi-supervised learning. In particular, we utilize Kernel Minimum Squared Error (KMSE) and Laplacian regularized KMSE (LapKMSE) for supervised and semi-supervised learning, respectively. Experimental results on several benchmark datasets illustrate the effectiveness of our algorithm.

Supervised Learning

Kernel (algebra)

Labeled data

Co-training

10.1109/chicc.2015.7260216

Cite

Citations (0)

Semi-Supervised Autoencoder: A Joint Approach of Representation and Classification

Haiyan Wu Haomin Yang Xueming Li Haijun Ren

Recent years have witnessed the significant success of representation learning and deep learning in various prediction and recognition applications. Most of these previous studies adopt the two-phase procedures, namely the first step of representation learning and then the second step of supervised learning. In this process, to fit the training data the initial model weights, which inherits the good properties from the representation learning in the first step, will be changed in the second step. In other words, the second step leans better classification models at the cost of the possible deterioration of the effectiveness of representation learning. Motivated by this observation we propose a joint framework of representation and supervised learning. It aims to learn a model, which not only guarantees the "semantics" of the original data from representation learning but also fit the training data well via supervised learning. Along this line we develop the model of semi-supervised Auto encoder under the spirit of the joint learning framework. The experiments on various data sets for classification show the significant effectiveness of the proposed model.

Autoencoder

Representation

Feature Learning

Supervised Learning

External Data Representation

10.1109/cicn.2015.275

Cite

Citations (12)