Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification

2019 
Omics data are normally high dimensional with large number of molecular features and relatively small number of available samples with clinical labels. The “curse of dimensionality” makes it challenging to train a machine learning model using high dimensional omics data like DNA methylation and gene expression profiles. Here we propose an end-to-end deep learning model called OmiVAE to extract low dimensional features and classify samples from multi-omics data. OmiVAE combines the basic structure of variational autoencoders with a classifier to achieve task-oriented feature extraction and multi-class classification. The training procedure of OmiVAE is comprised of an unsupervised phase and a supervised phase. During the unsupervised phase, a hierarchical cluster structure of samples can be automatically formed without the need for labels. And in the supervised phase, OmiVAE achieved an average accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and normal samples, which shows better performance than existing methods. The integrated model learned from multi-omics datasets outperformed those using only one type of omics data, which indicates that the complementary information from different omics datatypes provides useful insights for biomedical tasks like cancer classification.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    11
    Citations
    NaN
    KQI
    []