Constrained Covariance Matrices With a Biologically Realistic Structure: Comparison of Methods for Generating High-Dimensional Gaussian Graphical Models

2019 
High-dimensional data from molecular biology possess an intricate correlation structure that is imposed by the molecular interactions between genes and their products forming various different types of gene networks. This fact is particularly well known for gene expression data, because there is a sufficient number of large-scale data sets available that are amenable for a sensible statistical analysis confirming this assertion. The purpose of this paper is two fold. First, we investigate three methods for generating constrained covariance matrices with a biologically realistic structure. Such covariance matrices are playing a pivotal role in designing novel statistical methods for high-dimensional biological data, because they allow to define Gaussian graphical models (GGM) for the simulation of realistic data; including their correlation structure. We study local and global characteristics of these covariance matrices, and derived concentration/partial correlation matrices. Second, we connect these results, obtained from a probabilistic perspective, to statistical results of studies aiming to estimate gene regulatory networks from biological data. This connection allows to shed light on the well-known heterogeneity of statistical estimation methods for inferring gene regulatory networks and provides an explanation for the difficulties inferring molecular interactions between highly connected genes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    60
    References
    3
    Citations
    NaN
    KQI
    []