Inferring disease associations of the long non-coding RNAs through non-negative matrix factorization

2015 
Long non-coding RNAs (lncRNAs) have been implicated in various biological processes, and are linked in many dysregulations. Over the past decade, researchers reported a large number of human disease associations with the lncRNAs, both intergenic lncRNAs (lincRNAs) and non-intergenic lncRNAs. Thanks to the next generation sequencing platform, RNA-seq, through which researchers also were able to quantify expression profiles of each of the lncRNAs in human tissue samples. In this article we adapted the non-negative matrix factorization method to develop a low-rank computational model that can describe the existing knowledge about both non-intergenic and intergenic lncRNA-disease associations represented in a two dimensional association matrix as well as convey a way of ranking disease causing lncRNAs. We proposed several NMF formulations for the problem and we found that the sparsity-constrained NMF obtained the best model among all the other models. By exploiting the inherent bi-clustering ability of the NMF models, we extracted several lncRNA groups and disease groups that possess biological significance. Moreover, we proposed an integrative NMF formulation where we incorporated along with the coding gene and lincRNA disease association data, prior knowledge about relationship networks among the coding genes and lincRNAs, and the RNA-seq expression profile data to identify potential lincRNA-coding gene co-modules with which we further enhanced the lincRNA-disease associations and untangled mysteries about functional chemistry of the intergenic lncRNAs. Experimental results show the superiority of our proposed method over two state-of-the-art clustering algorithms—k-means and hierarchical clustering.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    43
    References
    9
    Citations
    NaN
    KQI
    []