Abstract Current efforts in systems genetics have focused on the development of statistical approaches that aim to disentangle causal relationships among molecular phenotypes in segregating populations. Reverse engineering of transcriptional networks plays a key role in the understanding of gene regulation. However, transcriptional regulation is only one possible mechanism, as methylation, phosphorylation, direct protein–protein interaction, transcription factor binding, etc., can also contribute to gene regulation. These additional modes of regulation can be interpreted as unobserved variables in the transcriptional gene network and can potentially affect its reconstruction accuracy. We develop tests of causal direction for a pair of phenotypes that may be embedded in a more complicated but unobserved network by extending Vuong’s selection tests for misspecified models. Our tests provide a significance level, which is unavailable for the widely used AIC and BIC criteria. We evaluate the performance of our tests against the AIC, BIC, and a recently published causality inference test in simulation studies. We compare the precision of causal calls using biologically validated causal relationships extracted from a database of 247 knockout experiments in yeast. Our model selection tests are more precise, showing greatly reduced false-positive rates compared to the alternative approaches. In practice, this is a useful feature since follow-up studies tend to be time consuming and expensive and, hence, it is important for the experimentalist to have causal predictions with low false-positive rates.
Although numerous quantitative trait loci (QTL) influencing disease-related phenotypes have been detected through gene mapping and positional cloning, identification of the individual gene(s) and molecular pathways leading to those phenotypes is often elusive.One way to improve understanding of genetic architecture is to classify phenotypes in greater depth by including transcriptional and metabolic profiling.In the current study, we have generated and analyzed mRNA expression and metabolic profiles in liver samples obtained in an F2 intercross between the diabetes-resistant C57BL/6 leptin ob/ob and the diabetes-susceptible BTBR leptin ob/ob mouse strains.This cross, which segregates for genotype and physiological traits, was previously used to identify several diabetes-related QTL.Our current investigation includes microarray analysis of over 40,000 probe sets, plus quantitative mass spectrometry-based measurements of sixty-seven intermediary metabolites in three different classes (amino acids, organic acids, and acyl-carnitines).We show that liver metabolites map to distinct genetic regions, thereby indicating that tissue metabolites are heritable.We also demonstrate that genomic analysis can be integrated with liver mRNA expression and metabolite profiling data to construct causal networks for control of specific metabolic processes in liver.As a proof of principle of the practical significance of this integrative approach, we illustrate the construction of a specific causal network that links gene expression and metabolic changes in the context of glutamate metabolism, and demonstrate its validity by showing that genes in the network respond to changes in glutamine and glutamate availability.Thus, the methods described here have the potential to reveal regulatory networks that contribute to chronic, complex, and highly prevalent diseases and conditions such as obesity and diabetes.
Summary. A semiparametric mixed effects regression model is proposed for the analysis of clustered or longitudinal data with continuous, ordinal, or binary outcome. The common assumption of Gaussian random effects is relaxed by using a predictive recursion method (Newton and Zhang, 1999) to provide a nonparametric smooth density estimate. A new strategy is introduced to accelerate the algorithm. Parameter estimates are obtained by maximizing the marginal profile likelihood by Powell's conjugate direction search method. Monte Carlo results are presented to show that the method can improve the mean squared error of the fixed effects estimators when the random effects distribution is not Gaussian. The usefulness of visualizing the random effects density itself is illustrated in the analysis of data from the Wisconsin Sleep Survey. The proposed estimation procedure is computationally feasible for quite large data sets.
This chapter focuses on computing strategies and software for gene mapping. We separately address software strategies for experimental crosses, known as quantitative trait loci (QTL) mapping, from those used in natural populations for association analysis. Both of these approaches look for correlations between genotypes and phenotypes. For most of the development in this chapter, we focus on a single phenotype, but we briefly note strategies that can examine multiple correlated phenotypes.