New gene association measures by joint network embedding of multiple gene expression datasets

2020 
Large number of samples are required to construct a reliable gene co-expression network, the samples from a single gene expression dataset are obviously not enough. However, batch effect may widely exist among datasets due to different experimental conditions. We proposed JEBIN (Joint Embedding of multiple BIpartite Networks) algorithm, it can learn a low-dimensional representation vector for each gene by integrating multiple bipartite networks, and each network corresponds to one dataset. JEBIN owns many inherent advantages, such as it is a nonlinear, global model, has linear time complexity with the number of genes, dataset or samples, and can integrate datasets with different distribution. We verified the effectiveness and scalability of JEBIN through a series of simulation experiments, and proved better performance on real biological data than commonly used integration algorithms. In addition, we conducted a differential co-expression analysis of hepatocellular carcinoma between the single-cell and bulk RNA-seq data, and also a contrast between the hepatocellular carcinoma and its adjacency samples using the bulk RNA-seq data. Analysis results prove that JEBIN can obtain comprehensive and stable gene co-expression networks through integrating multiple datasets and has wide prospect in the functional annotation of unknown genes and the regulatory mechanism inference of target genes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    0
    Citations
    NaN
    KQI
    []