We studied the global relationship between gene expression and neuroanatomical connectivity in the adult rodent brain. We utilized a large data set of the rat brain "connectome" from the Brain Architecture Management System (942 brain regions and over 5000 connections) and used statistical approaches to relate the data to the gene expression signatures of 17,530 genes in 142 anatomical regions from the Allen Brain Atlas. Our analysis shows that adult gene expression signatures have a statistically significant relationship to connectivity. In particular, brain regions that have similar expression profiles tend to have similar connectivity profiles, and this effect is not entirely attributable to spatial correlations. In addition, brain regions which are connected have more similar expression patterns. Using a simple optimization approach, we identified a set of genes most correlated with neuroanatomical connectivity, and find that this set is enriched for genes involved in neuronal development and axon guidance. A number of the genes have been implicated in neurodevelopmental disorders such as autistic spectrum disorder. Our results have the potential to shed light on the role of gene expression patterns in influencing neuronal activity and connectivity, with potential applications to our understanding of brain disorders. Supplementary data are available at http://www.chibi.ubc.ca/ABAMS.
We describe a simple software tool, 'matrix2png', for creating color images of matrix data. Originally designed with the display of microarray data sets in mind, it is a general tool that can be used to make simple visualizations of matrices for use in figures, web pages, slide presentations and the like. It can also be used to generate images 'on the fly' in web applications. Both continuous-valued and discrete-valued (categorical) data sets can be displayed. Many options are available to the user, including the colors used, the display of row and column labels, and scale bars. In this note we describe some of matrix2png's features and describe some places it has been useful in the authors' work.A simple web interface is available, and Unix binaries are available from http://microarray.cpmc.columbia.edu/matrix2png. Source code is available on request.
Protein interactions shape proteome function and thus biology. Identification of protein interactions is a major goal in molecular biology, but biochemical methods, although improving, remain limited in coverage and accuracy. Whereas computational predictions can guide biochemical experiments, low validation rates of predictions remain a major limitation. Here, we investigated computational methods in the prediction of a specific type of interaction, the inhibitory interactions between proteases and their inhibitors. Proteases generate thousands of proteoforms that dynamically shape the functional state of proteomes. Despite the important regulatory role of proteases, knowledge of their inhibitors remains largely incomplete with the vast majority of proteases lacking an annotated inhibitor. To link inhibitors to their target proteases on a large scale, we applied computational methods to predict inhibitory interactions between proteases and their inhibitors based on complementary data, including coexpression, phylogenetic similarity, structural information, co-annotation, and colocalization, and also surveyed general protein interaction networks for potential inhibitory interactions. In testing nine predicted interactions biochemically, we validated the inhibition of kallikrein 5 by serpin B12. Despite the use of a wide array of complementary data, we found a high false positive rate of computational predictions in biochemical follow-up. Based on a protease-specific definition of true negatives derived from the biochemical classification of proteases and inhibitors, we analyzed prediction accuracy of individual features, thereby we identified feature-specific limitations, which also affected general protein interaction prediction methods. Interestingly, proteases were often not coexpressed with most of their functional inhibitors, contrary to what is commonly assumed and extrapolated predominantly from cell culture experiments. Predictions of inhibitory interactions were indeed more challenging than predictions of nonproteolytic and noninhibitory interactions. In summary, we describe a novel and well-defined but difficult protein interaction prediction task and thereby highlight limitations of computational interaction prediction methods. Protein interactions shape proteome function and thus biology. Identification of protein interactions is a major goal in molecular biology, but biochemical methods, although improving, remain limited in coverage and accuracy. Whereas computational predictions can guide biochemical experiments, low validation rates of predictions remain a major limitation. Here, we investigated computational methods in the prediction of a specific type of interaction, the inhibitory interactions between proteases and their inhibitors. Proteases generate thousands of proteoforms that dynamically shape the functional state of proteomes. Despite the important regulatory role of proteases, knowledge of their inhibitors remains largely incomplete with the vast majority of proteases lacking an annotated inhibitor. To link inhibitors to their target proteases on a large scale, we applied computational methods to predict inhibitory interactions between proteases and their inhibitors based on complementary data, including coexpression, phylogenetic similarity, structural information, co-annotation, and colocalization, and also surveyed general protein interaction networks for potential inhibitory interactions. In testing nine predicted interactions biochemically, we validated the inhibition of kallikrein 5 by serpin B12. Despite the use of a wide array of complementary data, we found a high false positive rate of computational predictions in biochemical follow-up. Based on a protease-specific definition of true negatives derived from the biochemical classification of proteases and inhibitors, we analyzed prediction accuracy of individual features, thereby we identified feature-specific limitations, which also affected general protein interaction prediction methods. Interestingly, proteases were often not coexpressed with most of their functional inhibitors, contrary to what is commonly assumed and extrapolated predominantly from cell culture experiments. Predictions of inhibitory interactions were indeed more challenging than predictions of nonproteolytic and noninhibitory interactions. In summary, we describe a novel and well-defined but difficult protein interaction prediction task and thereby highlight limitations of computational interaction prediction methods. Identification of protein interactions is an important goal in molecular biology yet one that remains difficult. Approaches such as yeast-2-hybrid, coimmunoprecipitation and newer experimental methods (1.Kristensen A.R. Gsponer J. Foster L.J. A high-throughput approach for measuring temporal changes in the interactome.Nat. Methods. 2012; 9: 907-909Crossref PubMed Scopus (224) Google Scholar, 2.Weisbrod C.R. Chavez J.D. Eng J.K. Yang L. Zheng C. Bruce J.E. In vivo protein interaction network identified with a novel real-time cross-linked peptide identification strategy.J. Proteome Res. 2013; 12: 1569-1579Crossref PubMed Scopus (112) Google Scholar) are highly productive and scalable. However, limited accuracy from false positives and coverage that is context dependent remain problematic (3.von Mering C. Krause R. Snel B. Cornell M. Oliver S.G. Fields S. Bork P. Comparative assessment of large-scale data sets of protein–protein interactions.Nature. 2002; 417: 399-403Crossref PubMed Scopus (1924) Google Scholar, 4.Braun P. Tasan M. Dreze M. Barrios-Rodiles M. Lemmens I. Yu H. Sahalie J.M. Murray R.R. Roncari L. de Smet A.-S. Venkatesan K. Rual J.-F. Vandenhaute J. Cusick M.E. Pawson T. Hill D.E. Tavernier J. Wrana J.L. Roth F.P. Vidal M. An experimentally derived confidence score for binary protein–protein interactions.Nat. Methods. 2009; 6: 91-97Crossref PubMed Scopus (334) Google Scholar). Computational methods have been developed to predict protein–protein interactions, commonly linking together proteins on the basis of shared features such as patterns of conservation, expression, or annotations (5.Jansen R. Yu H. Greenbaum D. Kluger Y. Krogan N.J. Chung S. Emili A. Snyder M. Greenblatt J.F. Gerstein M. A Bayesian networks approach for predicting protein–protein interactions from genomic data.Science. 2003; 302: 449-453Crossref PubMed Scopus (1051) Google Scholar, 6.Rhodes D.R. Tomlins S.A. Varambally S. Mahavisno V. Barrette T. Kalyana-Sundaram S. Ghosh D. Pandey A. Chinnaiyan A.M. Probabilistic model of the human protein–protein interaction network.Nat. Biotechnol. 2005; 23: 951-959Crossref PubMed Scopus (353) Google Scholar, 7.Bhardwaj N. Lu H. Correlation between gene expression profiles and protein–protein interactions within and across genomes.Bioinformatics. 2005; 21: 2730-2738Crossref PubMed Scopus (135) Google Scholar, 8.Franceschini A. Szklarczyk D. Frankild S. Kuhn M. Simonovic M. Roth A. Lin J. Minguez P. Bork P. von Mering C. Jensen L.J. STRING v9.1: Protein–protein interaction networks, with increased coverage and integration.Nucleic Acids Res. 2013; 41: D808-D815Crossref PubMed Scopus (3296) Google Scholar)—a version of guilt by association. A second class of approaches uses protein structural features to identify potential physical interaction interfaces (9.Zhang Q.C. Petrey D. Deng L. Qiang L. Shi Y. Thu C.A. Bisikirska B. Lefebvre C. Accili D. Hunter T. Maniatis T. Califano A. Honig B. Structure-based prediction of protein–protein interactions on a genome-wide scale.Nature. 2012; 490: 556-560Crossref PubMed Scopus (511) Google Scholar). These approaches can be combined. However, their practical utility remains unclear. In the methods cited above, accuracy was estimated by cross-validation or by validating a small number of hand-picked test cases (5.Jansen R. Yu H. Greenbaum D. Kluger Y. Krogan N.J. Chung S. Emili A. Snyder M. Greenblatt J.F. Gerstein M. A Bayesian networks approach for predicting protein–protein interactions from genomic data.Science. 2003; 302: 449-453Crossref PubMed Scopus (1051) Google Scholar, 6.Rhodes D.R. Tomlins S.A. Varambally S. Mahavisno V. Barrette T. Kalyana-Sundaram S. Ghosh D. Pandey A. Chinnaiyan A.M. Probabilistic model of the human protein–protein interaction network.Nat. Biotechnol. 2005; 23: 951-959Crossref PubMed Scopus (353) Google Scholar). Estimates of the true efficacy of prediction methods in structured evaluations, such as those that exist for function prediction (critical assessment of protein function annotation algorithms (10.Radivojac P. Clark W.T. Oron T.R. Schnoes A.M. Wittkop T. Sokolov A. Graim K. Funk C. Verspoor K. Ben-Hur A. Pandey G. Yunes J.M. Talwalkar A.S. Repo S. Souza M.L. Piovesan D. Casadio R. Wang Z. Cheng J. Fang H. Gough J. Koskinen P. Törönen P. Nokso-Koivisto J. Holm L. Cozzetto D. Buchan D.W. Bryson K. Jones D.T. Limaye B. Inamdar H. Datta A. Manjari S.K. Joshi R. Chitale M. Kihara D. Lisewski A.M. Erdin S. Venner E. Lichtarge O. Rentzsch R. Yang H. Romero A.E. Bhat P. Paccanaro A. Hamp T. Kaβner R. Seemayer S. Vicedo E. Schaefer C. Achten D. Auer F. Boehm A. Braun T. Hecht M. Heron M. Hönigschmid P. Hopf T.A. Kaufmann S. Kiening M. Krompass D. Landerer C. Mahlich Y. Roos M. Björne J. Salakoski T. Wong A. Shatkay H. Gatzmann F. Sommer I. Wass M.N. Sternberg M.J. Škunca N. Supek F. Bošnjak M. Panov P. Džeroski S. Šmuc, Kourmpetis Y.A. van Dijk A.D.J. ter Braak C.J. Zhou Y. Gong Q. Dong X. Tian W. Falda M. Fontana P. Lavezzo E. Di Camillo B. Toppo S. Lan L. Djuric N. Guo Y. Vucetic S. Bairoch A. Linial M. Babbitt P.C. Brenner S.E. Orengo C. Rost B. Mooney S.D. Friedberg I. A large-scale evaluation of computational protein function prediction.Nat. Methods. 2013; 10: 221-227Crossref PubMed Scopus (589) Google Scholar)), structure prediction (critical assessment of protein structure prediction (11.Moult J. Fidelis K. Kryshtafovych A. Schwede T. Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—Round x.Proteins Struct. Funct. Bioinform. 2014; 82: 1-6Crossref PubMed Scopus (317) Google Scholar)), or for structural docking (critical assessment of prediction of interactions (12.Janin J. Welcome to CAPRI: A critical assessment of predicted interactions.Proteins Struct. Funct. Bioinforma. 2002; 47: 257Crossref Scopus (54) Google Scholar)), are lacking for protein interaction prediction methods. If computational predictions of interactions were sufficiently accurate, biochemical assays could be targeted more efficiently by focusing on predicted pairs (9.Zhang Q.C. Petrey D. Deng L. Qiang L. Shi Y. Thu C.A. Bisikirska B. Lefebvre C. Accili D. Hunter T. Maniatis T. Califano A. Honig B. Structure-based prediction of protein–protein interactions on a genome-wide scale.Nature. 2012; 490: 556-560Crossref PubMed Scopus (511) Google Scholar), but to date, computational predictions do not appear to have played a major role in interaction discovery or prioritization (13.Pavlidis P. Gillis J. Progress and challenges in the computational prediction of gene function using networks: 2012–2013 update.F1000Research. 2013; 2: 230Crossref PubMed Scopus (15) Google Scholar). We hypothesized that studying a specific subset of protein interactions and combining computational prediction and biochemical validation will grant deeper insights into the pitfalls and state of the art for general protein interaction predictions. We focused on the prediction of interactions between protease inhibitors and proteases—a problem that has not received specific attention to our knowledge—despite being characterized by covalent or low-KD noncovalent interactions (low nm or pm) and hence, in principle, being more tractable for identification than high-KD noncovalent, general protein–protein interactions. Previous cell culture and transcript analyses have suggested that known protease–inhibitor pairs are often coexpressed and coregulated (14.Breckon J.J. Papaioannou S. Kon L.W. Tumber A. Hembry R.M. Murphy G. Reynolds J.J. Meikle M.C. Stromelysin (MMP-3) synthesis is up-regulated in estrogen-deficient mouse osteoblasts in vivo and in vitro.J. Bone Miner. Res. 1999; 14: 1880-1890Crossref PubMed Scopus (47) Google Scholar, 15.Nuttall R.K. Pennington C.J. Taplin J. Wheal A. Yong V.W. Forsyth P.A. Edwards D.R. Elevated membrane-type matrix metalloproteinases in gliomas revealed by profiling proteases and inhibitors in human cancer cells1 1 Norfolk and Norwich big C appeal; the Medical Research Council; The Canadian Institutes of Health Research; and The European Union Framework V (Contract no. QLG1–2000-00131).Mol. Cancer Res. 2003; 1: 333-345PubMed Google Scholar). It is therefore hypothesized that protease–inhibitor coexpression plays a major role in the regulation of the detrimental activities of a protease. Inverse protease–inhibitor coexpression is thought to amplify protease activity but has only been observed for relatively few protease–inhibitor pairs (16.Overall C.M. Wrana J.L. Sodek J. Independent regulation of collagenase, 72-kDa progelatinase, and metalloendoproteinase inhibitor expression in human fibroblasts by transforming growth factor-beta.J. Biol. Chem. 1989; 264: 1860-1869Abstract Full Text PDF PubMed Google Scholar, 17.Overall C.M. Sodek J. Concanavalin A produces a matrix-degradative phenotype in human fibroblasts. Induction and endogenous activation of collagenase, 72-kDa gelatinase, and pump-1 is accompanied by the suppression of the tissue inhibitor of matrix metalloproteinases.J. Biol. Chem. 1990; 265: 21141-21151Abstract Full Text PDF PubMed Google Scholar). Overall, it is currently a common assumption that protease–inhibitor coexpression is evidence for an inhibitory interaction, but this concept has not been tested comprehensively. Proteases are a critical component of the posttranslational regulatory machinery in cells and therefore promising drug targets. However, drug targeting of proteases has been hampered by complex protease biology that is often poorly understood. One aspect of this complexity is the organization of proteases in dense interaction networks of protease cleavage and interaction (18.Fortelny N. Cox J.H. Kappelhoff R. Starr A.E. Lange P.F. Pavlidis P. Overall C.M. Network analyses reveal pervasive functional regulation between proteases in the human protease web.PLoS Biol. 2014; 12: e1001869Crossref PubMed Scopus (121) Google Scholar). Proteases regulate the activity of other proteases by direct cleavage or by cleaving their endogenous inhibitors, which in turn influences additional distal cleavage events. Thus, proteases can potentially indirectly influence the cleavage of substrates other than their direct substrates. We recently established a graph model of protease web interactions based on existing biochemical data that can be used to predict proteolytic pathways (19.Fortelny N. Yang S. Pavlidis P. Lange P.F. Overall C.M. Proteome TopFIND 3.0 with TopFINDer and PathFINDer: Database and analysis tools for the association of protein termini to pre- and post-translational events.Nucleic Acids Res. 2015; 43: D290-D297Crossref PubMed Scopus (83) Google Scholar). However, the network is far from its full potential because cleavage and inhibition interaction data underlying the model are incomplete. This is mainly due to the lack of studies of proteases and inhibitors but also to the lack of uploading of existing data to community databases. Computational prediction could provide a means to accelerate the addition of interactions to this network. However, large-scale computational prediction efforts in protease interaction biology have been limited to the use of molecular features of proteases and their substrates to predict protease cleavage (20.Song J. Matthews A.Y. Reboul C.F. Kaiserman D. Pike R.N. Bird P.I. Whisstock J.C. Predicting serpin/protease interactions.Methods Enzymol. 2011; 501: 237-273Crossref PubMed Scopus (9) Google Scholar) and have largely ignored protease inhibition. Therefore, the whole realm of protease inhibition is underexplored, with 354 (∼80%) of 444 human proteases lacking annotated inhibitors and 13 (∼14%) of 94 inhibitors without any annotated targets (orphan inhibitors) in the MEROPS protease database (21.Rawlings N.D. Barrett A.J. Bateman A. MEROPS: The database of proteolytic enzymes, their substrates and inhibitors.Nucleic Acids Res. 2012; 40: D343-D350Crossref PubMed Scopus (711) Google Scholar). Proteases are regulated by multiple mechanisms other than inhibition such as autodegradation, reversible activation, substrate-induced activation, and other allosteric activators. However, protease inhibitors are often present in adjacent compartments to block and clear excess proteases that could rapidly and irreversibly cleave a large number of proteins. Protease inhibitors are therefore often secreted in the plasma or distal tissues to block proteases delivered by diffusion, secretion, or leakage from tissues to the circulation. Considering the key role of proteases in cell signaling pathways, identifying additional, physiologically relevant protease–inhibitor pairs would greatly benefit our understanding of protease biology. Important questions in interaction prediction methods are which input data to use for predictions and how to evaluate performance (in contrast, the prediction algorithm used plays relatively little role (22.Gillis J. Pavlidis P. The role of indirect connections in gene networks in predicting function.Bioinformatics. 2011; 27: 1860-1866Crossref PubMed Scopus (48) Google Scholar)). To evaluate performance of a predictor, efficacy in separating predefined true positives (TP) 1The abbreviations used are: TP, true positives; TN, true negatives; PPI, protein-protein interaction; MMP, matrix metalloproteinase; AUC, area under the curve; ROC, receiver operating characteristic; RPKM, reads per kilobase of transcript per million mapped reads; GO, Gene Ontology; EXP, Inferred from Experiment; IDA, Inferred from Direct Assay; IPI, Inferred from Physical Interaction; IGI, Inferred from Genetic Interaction; IMP, Inferred from Mutant Phenotype; IEP, Inferred from Expression Pattern; TAS, Traceable Author Statement. 1The abbreviations used are: TP, true positives; TN, true negatives; PPI, protein-protein interaction; MMP, matrix metalloproteinase; AUC, area under the curve; ROC, receiver operating characteristic; RPKM, reads per kilobase of transcript per million mapped reads; GO, Gene Ontology; EXP, Inferred from Experiment; IDA, Inferred from Direct Assay; IPI, Inferred from Physical Interaction; IGI, Inferred from Genetic Interaction; IMP, Inferred from Mutant Phenotype; IEP, Inferred from Expression Pattern; TAS, Traceable Author Statement. and true negative (TN) examples is measured. For example, in interaction prediction, if most true interacting proteins are coexpressed and noninteractors are not coexpressed, then coexpression is a good predictor of interaction. The better the separation of the two groups, the better the predictive performance. In general, TPs are readily found in biological databases, but the definition of TNs is a challenge, especially for weak interactions having low mm KDs, and more practically since a lack of interaction is rarely established and documented. Common approaches therefore use unlikely interactions as TNs, for example, random interactions (based on the assumption that true interactions are a small subset of all possible interactions) or interactions between proteins localized to different cellular compartments according to annotation (4.Braun P. Tasan M. Dreze M. Barrios-Rodiles M. Lemmens I. Yu H. Sahalie J.M. Murray R.R. Roncari L. de Smet A.-S. Venkatesan K. Rual J.-F. Vandenhaute J. Cusick M.E. Pawson T. Hill D.E. Tavernier J. Wrana J.L. Roth F.P. Vidal M. An experimentally derived confidence score for binary protein–protein interactions.Nat. Methods. 2009; 6: 91-97Crossref PubMed Scopus (334) Google Scholar). An advantage of the protease–inhibitor prediction task is the ability to define TP and TN inhibitions more accurately. Protease inhibitors are characterized by tight interactions with their cognate proteases, thus providing a clear separation between true and false interactors. Further, proteases and their inhibitors are organized into families based on their primary sequence and into clans based on the structure of their active site and reactive site, respectively (21.Rawlings N.D. Barrett A.J. Bateman A. MEROPS: The database of proteolytic enzymes, their substrates and inhibitors.Nucleic Acids Res. 2012; 40: D343-D350Crossref PubMed Scopus (711) Google Scholar). Families and clans of inhibitors can mostly be assigned specifically to one or two target protease classes. Thus, it is possible to define TN pairs, where the inhibitor cannot inhibit the protease based on known chemical and structural constraints. As examples, a serpin will not inhibit a metalloprotease, and a tissue inhibitor of metalloproteinases will neither inhibit a serine protease nor aspartate, threonine, or cysteine proteases. However, matrix metalloproteinases (MMPs) cleave and inactivate many serpins and so transiently are also interactors before peptide bond scission, albeit with a moderate KD (∼ Km) (18.Fortelny N. Cox J.H. Kappelhoff R. Starr A.E. Lange P.F. Pavlidis P. Overall C.M. Network analyses reveal pervasive functional regulation between proteases in the human protease web.PLoS Biol. 2014; 12: e1001869Crossref PubMed Scopus (121) Google Scholar, 23.auf dem Keller U. Prudova A. Eckhard U. Fingleton B. Overall C.M. Systems-level analysis of proteolytic events in increased vascular permeability and complement activation in skin inflammation.Sci. Signal. 2013; 6: rs2Crossref PubMed Scopus (96) Google Scholar). A further advantage of selecting this group of proteins in the analysis of prediction methods is the accuracy of biochemical testing of the predictions by measuring inhibition of the catalytic activity of the protease. Here, we defined TP inhibitions (n = 294) as those inhibitions annotated in MEROPS (21.Rawlings N.D. Barrett A.J. Bateman A. MEROPS: The database of proteolytic enzymes, their substrates and inhibitors.Nucleic Acids Res. 2012; 40: D343-D350Crossref PubMed Scopus (711) Google Scholar). We defined TN inhibitions (n = 6,990) as enzymatically implausible inhibitor/protease pairs that are known not to be inhibitory. Using this gold standard, we evaluated the predictive power of common interaction prediction methodology to predict protease–inhibitor pairs in the protease web. Predictions were based on protein–protein interaction data, coannotation, coexpression, phylogenetic similarity, and colocalization as input data. Interestingly, we report that coexpression is surprisingly low for many functional protease–inhibitor pairs, contrary to what is commonly assumed. In particular, we employed 40 interaction predictors based on coexpression values derived from different input data and correlation metrics, all of which we found suffered from weak predictive power. Nonetheless, we predicted 270 protease–inhibitor pairs, examined 9 of these predicted inhibitions biochemically, and validated the novel inhibition of kallikrein 5 (KLK5) by serpin B12 (SERPINB12), previously an orphan inhibitor. Protease and protease inhibitor data and coexpression matrices used throughout the analyses are available for download at http://hdl.handle.net/11272/10472. Protease and inhibitor class, family, cleavage, and inhibitor information were extracted from the MEROPS database (http://merops.sanger.ac.uk/) (21.Rawlings N.D. Barrett A.J. Bateman A. MEROPS: The database of proteolytic enzymes, their substrates and inhibitors.Nucleic Acids Res. 2012; 40: D343-D350Crossref PubMed Scopus (711) Google Scholar) version 9.9 on September 30, 2013. MEROPS identifiers were used to classify proteases and inhibitors into classes and families as described on the MEROPS website. Protein-protein interaction (PPI) data from Human Integrated Protein-Protein Interaction Reference (24.Schaefer M.H. Fontaine J.-F. Vinayagam A. Porras P. Wanker E.E. Andrade-Navarro M.A. HIPPIE: Integrating protein interaction networks with experiment based quality scores.PLoS ONE. 2012; 7: e31826Crossref PubMed Scopus (237) Google Scholar) version 1.5 were downloaded on June 12, 2013. PPI data from high-throughput experiments were downloaded from BioGRID (25.Chatr-Aryamontri A. Breitkreutz B.-J. Oughtred R. Boucher L. Heinicke S. Chen D. Stark C. Breitkreutz A. Kolas N. O'Donnell L. Reguly T. Nixon J. Ramage L. Winter A. Sellam A. Chang C. Hirschman J. Theesfeld C. Rust J. Livstone M.S. Dolinski K. Tyers M. The BioGRID interaction database: 2015 update.Nucleic Acids Res. 2015; 43: D470-D478Crossref PubMed Scopus (705) Google Scholar) on October 11, 2013. PPI data from (26.Bossi A. Lehner B. Tissue specificity and the human protein interaction network.Mol. Syst. Biol. 2009; 5: 260Crossref PubMed Scopus (260) Google Scholar) were downloaded on October 11, 2013. Experiments with up to 100 identified PPIs were considered low throughput, those with 100–1,000 PPIs were labeled medium throughput, and those with more than 1,000 PPIs were deemed high throughput. Protein localization information was downloaded from three sources: LocDB (27.Rastogi S. Rost B. LocDB: Experimental annotations of localization for Homo sapiens Arabidopsis thaliana.Nucleic Acids Res. 2011; 39: D230-D234Crossref PubMed Scopus (39) Google Scholar) (data downloaded November 19, 2013), the Human Protein Atlas (28.Uhlen M. Oksvold P. Fagerberg L. Lundberg E. Jonasson K. Forsberg M. Zwahlen M. Kampf C. Wester K. Hober S. Wernerus H. Björling L. Ponten F. Towards a knowledge-based Human Protein Atlas.Nat. Biotechnol. 2010; 28: 1248-1250Crossref PubMed Scopus (1706) Google Scholar) (downloaded November 12, 2013.), and Gene Ontology (GO) annotation using the hgu95av2.db package in R (29.RCore Team R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria2013Google Scholar) (downloaded August 8, 2013). For each dataset, annotations were mapped to GO terms and annotation trees for each protein were generated using the GOstats package in R (29.RCore Team R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria2013Google Scholar). For LocDB, primary and secondary localization information was combined for each protein. Main and other localization data from the Human Protein Atlas were used if the reliability was annotated as High, Medium, or Supportive. GO annotations were retained if the evidence code was one of EXP, IDA, IPI, IGI, IMP, IEP, or TAS. Genome Tissue Expression Atlas (GTEx) data (30.Lonsdale J. Thomas J. Salvatore M. Phillips R. Lo E. Shad S. Hasz R. Walters G. Garcia F. Young N. Foster B. Moser M. Karasik E. Gillard B. Ramsey K. Sullivan S. Bridge J. Magazine H. Syron J. Fleming J. Siminoff L. Traino H. Mosavel M. Barker L. Jewell S. Rohrer D. Maxim D. Filkins D. Harbach P. Cortadillo E. Berghuis B. Turner L. Hudson E. Feenstra K. Sobin L. Robb J. Branton P. Korzeniewski G. Shive C. Tabor D. Qi L. Groch K. Nampally S. Buia S. Zimmerman A. Smith A. Burges R. Robinson K. Valentino K. Bradbury D. Cosentino M. Diaz-Mayoral N. Kennedy M. Engel T. Williams P. Erickson K. Ardlie K. Winckler W. Getz G. DeLuca D. MacArthur D. Kellis M. Thomson A. Young T. Gelfand E. Donovan M. Meng Y. Grant G. Mash D. Marcus Y. Basile M. Liu J. Zhu J. Tu Z. Cox N.J. Nicolae D.L. Gamazon E.R. Im H.K. Konkashbaev A. Pritchard J. Stevens M. Flutre T. Wen X. Dermitzakis E.T. Lappalainen T. Guigo R. Monlong J. Sammeth M. Koller D. Battle A. Mostafavi S. McCarthy M. Rivas M. Maller J. Rusyn I. Nobel A. Wright F. Shabalin A. Feolo M. Sharopova N. Sturcke A. Paschal J. Anderson J.M. Wilder E.L. Derr L.K. Green E.D. Struewing J.P. Temple G. Volpi S. Boyer J.T. Thomson E.J. Guyer M.S. Ng C. Abdallah A. Colantuoni D. Insel T.R. Koester S.E. Little A.R. Bender P.K. Lehner T. Yao Y. Compton C.C. Vaught J.B. Sawyer S. Lockhart N.C. Demchok J. Moore H.F. The Genotype-Tissue Expression (GTEx) project.Nat. Genet. 2013; 45: 580-585Crossref PubMed Scopus (4349) Google Scholar) were downloaded on January 31, 2013. Gene Expression Omnibus Series 7307 expression data were downloaded from the database Gemma (31.Zoubarev A. Hamer K.M. Keshav K.D. McCarthy E.L. Santos J.R.C. Van Rossum T. McDonald C. Hall A. Wan X. Lim R. Gillis J. Pavlidis P. Gemma: A resource for the reuse, sharing and meta-analysis of expression profiling data.Bioinformatics. 2012; 28: 2272-2273Crossref PubMed Scopus (73) Google Scholar) on June 26, 2013. Other microarray-based expression datasets used in meta-coexpression analysis were downloaded from Gemma on January 18, 2013 and are listed in supplemental Table S6. Gene correlation was calculated using the cor function in R (29.RCore Team R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria2013Google Scholar). Partial correlation was calculated using the ppcor package in R. Full datasets or subsets were used as inputs as explained in the results section and in supplemental Table S5. Phylogenetic profile data were constructed by downloading mappings from human proteins to other species from InParanoid (32.Östlund G. Schmitt T. Forslund K. Köstler T. Messina D.N. Roopra S. Frings O. Sonnhammer E.L. InParanoid 7: New algorithms and tools for eukaryotic orthology analysis.Nucleic Acids Res. 2010; 38: D196-D203Crossref PubMed Scopus (502) Google Scholar). Mappings were binarized into 0 (absent) and 1 (present) for the binary networks before calculating the fraction of agreement (where the genes are absent or present in both organisms), Pearson correlation (cor package in R (29.RCore Team R: A La
While multiple studies have been conducted of gene expression in mouse models of Alzheimer's disease (AD), their findings have not reached a clear consensus and have not accounted for the potentially confounding effects of changes in cellular composition. To help address this gap, we conducted a re-analysis based meta-analysis (mega-analysis) of ten independent studies of hippocampal gene expression in mouse models of AD. We used estimates of cellular composition as covariates in statistical models aimed to identify genes differentially expressed (DE) at either early or late stages of progression. Our analysis revealed changes in gene expression at early phases shared across studies, including dysregulation of genes involved in cholesterol biosynthesis and the complement system. Expression changes at later stages were dominated by cellular compositional effects. Thus, despite the considerable heterogeneity of the mouse models, we identified common patterns that may contribute to our understanding of AD etiology. Our work also highlights the importance of controlling for cellular composition effects in genomics studies of neurodegeneration.
Long-term population viability of Fraser River sockeye salmon (Oncorhynchus nerka) is threatened by unusually high levels of mortality as they swim to their spawning areas before they spawn. Functional genomic studies on biopsied gill tissue from tagged wild adults that were tracked through ocean and river environments revealed physiological profiles predictive of successful migration and spawning. We identified a common genomic profile that was correlated with survival in each study. In ocean-tagged fish, a mortality-related genomic signature was associated with a 13.5-fold greater chance of dying en route. In river-tagged fish, the same genomic signature was associated with a 50% increase in mortality before reaching the spawning grounds in one of three stocks tested. At the spawning grounds, the same signature was associated with 3.7-fold greater odds of dying without spawning. Functional analysis raises the possibility that the mortality-related signature reflects a viral infection.
Abstract – In laboratory and field studies of survival, one of two alternative analytical techniques is often used to estimate survival rates and identify covariates, namely parametric survival analysis or Cormack–Jolly–Seber models. These techniques differ in algorithms and assumptions of the data. They also tend to be used under different circumstances depending on whether the intention is to demonstrate group‐specific differences or to predict survival variables. Here, we apply and compare both analytical techniques in a study that couples functional genomics with biotelemetry to ascertain the role of physiological condition on survival of adult sockeye salmon ( Oncorhynchus nerka ) migrating in the Fraser River, British Columbia, which builds on the growing concern over the decline in numbers of spawning fish. Herein, we show a high level of quantitative and qualitative agreement between the two analytical methods, with both showing a strong relationship exists between the genomic signature that accounts for the largest source of variance in gene expression among individuals and survival in one of the three populations assessed. This high level of agreement suggests the data and the approaches are generating reliable results. The novel approach used in our study to identify physiological processes associated with reduced fitness in wild populations should be of broad interest to conservation biologists and resource managers as it may help reduce the uncertainty associated with predicting population sizes.
The incidence of neural tube defects (NTDs) declined by about 40 % in Canada with the introduction of a national folic acid (FA) fortification program. Despite the fact that few Canadians currently exhibit folate deficiency, NTDs are still the second most common congenital abnormality. FA fortification may have aided in reducing the incidence of NTDs by overcoming abnormal one carbon metabolism cycling, the process which provides one carbon units for methylation of DNA. We considered that NTDs persisting in a folate-replete population may also occur in the context of FA-independent compromised one carbon metabolism, and that this might manifest as abnormal DNA methylation (DNAm). Second trimester human placental chorionic villi, kidney, spinal cord, brain, and muscle were collected from 19 control, 22 spina bifida, and 15 anencephalic fetuses in British Columbia, Canada. DNA was extracted, assessed for methylenetetrahydrofolate reductase (MTHFR) genotype and for genome-wide DNAm using repetitive elements, in addition to the Illumina Infinium HumanMethylation450 (450k) array.No difference in repetitive element DNAm was noted between NTD status groups. Using a false discovery rate <0.05 and average group difference in DNAm ≥0.05, differentially methylated array sites were identified only in (1) the comparison of anencephaly to controls in chorionic villi (n = 4 sites) and (2) the comparison of spina bifida to controls in kidney (n = 3342 sites).We suggest that the distinctive DNAm of spina bifida kidneys may be consequent to the neural tube defect or reflective of a common etiology for abnormal neural tube and renal development. Though there were some small shifts in DNAm in the other tested tissues, our data do not support the long-standing hypothesis of generalized altered genome-wide DNAm in NTDs. This finding may be related to the fact that most Canadians are not folate deficient, but it importantly opens the field to the investigation of other epigenetic and non-epigenetic mechanisms in the etiology of NTDs.