Integrating TARA Oceans Datasets Using Unsupervised Multiple Kernel Learning

2017 
In metagenomic analysis, the integration of various sources of information is a difficult task since produced datasets are often of heterogeneous types. These datasets can be composed of species counts, which need to be analysed with distances, but also species abundances, interaction networks or phylogenetic information which have been shown relevant to provide a better comparison between communities. Standard integration methods can take advantage of external information but do not allow to analyse heterogenous multi-omics datasets in a generic way. We propose a multiple kernel framework that allows to integrate multiple datasets of various types into a single exploratory analysis. Several solutions are provided to learn either a consensus meta-kernel or a meta-kernel that preserves the original topology of the datasets. This kernel is subsequently used in kernel PCA to provide a fast and accurate visualisation of similarities between samples, in a non linear space and from the multiple source point of view. A generic procedure is also proposed to improve the interpretability of the kernel PCA in regards with the original data. We applied our framework to the multiple metagenomic datasets collected during the TARA Oceans expedition. We demonstrate that our method is able to retrieve previous findings in a single analysis as well as to provide a new image of the sample structures when a larger number of datasets are included in the analysis.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    4
    Citations
    NaN
    KQI
    []