The regulation of gene expression by transcription factors is a key determinant of cellular phenotypes. Deciphering genome-wide networks that capture which transcription factors regulate which genes is one of the major efforts towards understanding and accurate modeling of living systems. However, reverse-engineering the network from gene expression profiles remains a challenge, because the data are noisy, high dimensional and sparse, and the regulation is often obscured by indirect connections. We introduce a gene regulatory network inference algorithm ENNET, which reverse-engineers networks of transcriptional regulation from a variety of expression profiles with a superior accuracy compared to the state-of-the-art methods. The proposed method relies on the boosting of regression stumps combined with a relative variable importance measure for the initial scoring of transcription factors with respect to each gene. Then, we propose a technique for using a distribution of the initial scores and information about knockouts to refine the predictions. We evaluated the proposed method on the DREAM3, DREAM4 and DREAM5 data sets and achieved higher accuracy than the winners of those competitions and other established methods. Superior accuracy achieved on the three different benchmark data sets shows that ENNET is a top contender in the task of network inference. It is a versatile method that uses information about which gene was knocked-out in which experiment if it is available, but remains the top performer even without such information. ENNET is available for download from https://github.com/slawekj/ennet under the GNU GPLv3 license.
<p>PDF file - 55K, Scatterplots of RPPA measurements for total and phosphorylated forms of ERK1/2 (A), AKT (B) and ERBB3. Normalized relative intensity values for the total protein are plotted along the x-axis and values for the phosphorylated protein along the y-axis. Regression lines and r2 goodness of fit values are also shown.</p>
<p>PDF file - 1038K, Systems networks of AKT (A), apoptosis (B), IGF-1R (C) and mTOR (D) pathway activation. Node color: Violet - node representing the Pathway Score; Blue - phosphoproteins (linked to pathway score or to other proteins/genes at the protein level); Yellow - genes (linked to Pathway Score or to other proteins/genes at the mRNA level); Orange - microRNA; Dark green - drug; Light green - metabolite; Gray - other. Node shape: Diamond - Pathway Score node, or phosphoprotein that is used in calculating the Pathway Score; Circle - other nodes. Edge color & label: Brown - relationship inferred based on phosphoprotein level (either with level other phosphoprotein, or with Pathway Score); Gray - relationship inferred based on gene mRNA expression (either with mRNA of other gene, or with Pathway Score); Dark green - phosphoprotein level or gene expression (mRNA) is significantly correlated with drug's response measured as -log(GI50); Light green - phosphoprotein level or gene expression (mRNA) is significantly correlated with metabolite concentration; Red - phosphoprotein level or gene expression (mRNA) is significantly correlated with mutation of other gene. Arrow points from mutation gene to the mRNA gene; Pink - gene expression (mRNA) is significantly correlated with methylation of other gene. Arrow points from methylation gene to the mRNA gene; Dark blue - gene expression (mRNA) is significantly correlated with copy number of other gene. Arrow points from copy number gene to the mRNA gene; Orange - gene expression (mRNA) is significantly correlated with expression of microRNA. Edge line style: Solid - positive correlation; Dashed - negative correlation.</p>
The NCI-60 cell line set is likely the most molecularly profiled set of human tumor cell lines in the world. However, a critical missing component of previous analyses has been the inability to place the massive amounts of "-omic" data in the context of functional protein signaling networks, which often contain many of the drug targets for new targeted therapeutics. We used reverse-phase protein array (RPPA) analysis to measure the activation/phosphorylation state of 135 proteins, with a total analysis of nearly 200 key protein isoforms involved in cell proliferation, survival, migration, adhesion, etc., in all 60 cell lines. We aggregated the signaling data into biochemical modules of interconnected kinase substrates for 6 key cancer signaling pathways: AKT, mTOR, EGF receptor (EGFR), insulin-like growth factor-1 receptor (IGF-1R), integrin, and apoptosis signaling. The net activation state of these protein network modules was correlated to available individual protein, phosphoprotein, mutational, metabolomic, miRNA, transcriptional, and drug sensitivity data. Pathway activation mapping identified reproducible and distinct signaling cohorts that transcended organ-type distinctions. Direct correlations with the protein network modules involved largely protein phosphorylation data but we also identified direct correlations of signaling networks with metabolites, miRNA, and DNA data. The integration of protein activation measurements into biochemically interconnected modules provided a novel means to align the functional protein architecture with multiple "-omic" data sets and therapeutic response correlations. This approach may provide a deeper understanding of how cellular biochemistry defines therapeutic response. Such "-omic" portraits could inform rational anticancer agent screenings and drive personalized therapeutic approaches.
<p>PDF file - 150K, Identification and characterization of activated pathway modules by signal transduction representation: IGF-1R (A), apoptosis (C), mTOR (E), AKT (G) and EGFR (I) and unsupervised hierarchical clustering: IGF-1R (B), apoptosis (D), mTOR (F), AKT (H) and EGFR (J). For the pathway representation, the inhibitory phosphorylations considered in the analysis are shown in red and the activating phosphorylations in green. For the unsupervised hierarchical clustering, the complete panel of NCI-60 (vertical axis) is shown. Specific endpoint relative intensity values have been used to create the heatmaps. After overall score calculation, we highlighted the cell lines with the top 10 pathway scores (red), and the cell lines with the lowest 10 pathway activation scores (green).</p>
<p>PDF file - 150K, Identification and characterization of activated pathway modules by signal transduction representation: IGF-1R (A), apoptosis (C), mTOR (E), AKT (G) and EGFR (I) and unsupervised hierarchical clustering: IGF-1R (B), apoptosis (D), mTOR (F), AKT (H) and EGFR (J). For the pathway representation, the inhibitory phosphorylations considered in the analysis are shown in red and the activating phosphorylations in green. For the unsupervised hierarchical clustering, the complete panel of NCI-60 (vertical axis) is shown. Specific endpoint relative intensity values have been used to create the heatmaps. After overall score calculation, we highlighted the cell lines with the top 10 pathway scores (red), and the cell lines with the lowest 10 pathway activation scores (green).</p>
<p>PDF file - 160K, Unsupervised hierarchical clustering representing the NCI-60 cell lines (horizontal axis) and the RTKs (vertical axis) analyzed. The NCI-60 sample names have been colorized according to their tissue origin: orange - kidney, blue - lung, pink - breast, red - blood, green - colon, brown - prostate, purple - ovary, black - skin and gray - central nervous system.</p>
<p>PDF file - 1038K, Systems networks of AKT (A), apoptosis (B), IGF-1R (C) and mTOR (D) pathway activation. Node color: Violet - node representing the Pathway Score; Blue - phosphoproteins (linked to pathway score or to other proteins/genes at the protein level); Yellow - genes (linked to Pathway Score or to other proteins/genes at the mRNA level); Orange - microRNA; Dark green - drug; Light green - metabolite; Gray - other. Node shape: Diamond - Pathway Score node, or phosphoprotein that is used in calculating the Pathway Score; Circle - other nodes. Edge color & label: Brown - relationship inferred based on phosphoprotein level (either with level other phosphoprotein, or with Pathway Score); Gray - relationship inferred based on gene mRNA expression (either with mRNA of other gene, or with Pathway Score); Dark green - phosphoprotein level or gene expression (mRNA) is significantly correlated with drug's response measured as -log(GI50); Light green - phosphoprotein level or gene expression (mRNA) is significantly correlated with metabolite concentration; Red - phosphoprotein level or gene expression (mRNA) is significantly correlated with mutation of other gene. Arrow points from mutation gene to the mRNA gene; Pink - gene expression (mRNA) is significantly correlated with methylation of other gene. Arrow points from methylation gene to the mRNA gene; Dark blue - gene expression (mRNA) is significantly correlated with copy number of other gene. Arrow points from copy number gene to the mRNA gene; Orange - gene expression (mRNA) is significantly correlated with expression of microRNA. Edge line style: Solid - positive correlation; Dashed - negative correlation.</p>