TriTrypDB (http://tritrypdb.org) is an integrated database providing access to genome-scale datasets for kinetoplastid parasites, and supporting a variety of complex queries driven by research and development needs. TriTrypDB is a collaborative project, utilizing the GUS/WDK computational infrastructure developed by the Eukaryotic Pathogen Bioinformatics Resource Center (EuPathDB.org) to integrate genome annotation and analyses from GeneDB and elsewhere with a wide variety of functional genomics datasets made available by members of the global research community, often pre-publication. Currently, TriTrypDB integrates datasets from Leishmania braziliensis, L. infantum, L. major, L. tarentolae, Trypanosoma brucei and T. cruzi. Users may examine individual genes or chromosomal spans in their genomic context, including syntenic alignments with other kinetoplastid organisms. Data within TriTrypDB can be interrogated utilizing a sophisticated search strategy system that enables a user to construct complex queries combining multiple data types. All search strategies are stored, allowing future access and integrated searches. 'User Comments' may be added to any gene page, enhancing available annotation; such comments become immediately searchable via the text search, and are forwarded to curators for incorporation into the reference annotation when appropriate.
ABSTRACT Over the last two decades, molecular biology has been changed by the introduction of high-throughput technologies. Data sharing requirements have prompted the establishment of persistent data archives. A standardized approach for recording and managing these data was first proposed in the Minimal Information About a Microarray Experiment (MIAME) guidelines. The Minimal Information about a high throughput nucleotide Sequencing Experiment (MINSEQE) proposal was introduced in 2008 as a logical extension of the guidelines to next-generation sequencing (NGS) technologies used for transcriptome analysis. We present a historical snapshot of the data-sharing situation focusing on transcriptomics data from both microarray and RNA-sequencing experiments published between 2009 and 2013, a period during which RNA-seq studies became increasingly popular for transcriptome analysis. We assess how much data from RNA-seq based experiments is actually available in persistent data archives, compared to data derived from microarray based experiments, and evaluate how these types of data differ. Based on this analysis, we provide recommendations to improve RNA-seq data availability, reusability, and reproducibility.
Background: The analysis and interpretation of data generated from patient-derived clinical samples relies on access to high-quality bioinformatics resources. These are maintained and updated by expert curators extracting knowledge from unstructured biological data described in free-text journal articles and converting this into more structured, computationally-accessible forms. This enables analyses such as functional enrichment of sets of genes/proteins using the Gene Ontology, and makes the searching of data more productive by managing issues such as gene/protein name synonyms, identifier mapping, and data quality. Objective: To undertake a coordinated annotation update of key public-domain resources to better support Alzheimer’s disease research. Methods: We have systematically identified target proteins critical to disease process, in part by accessing informed input from the clinical research community. Results: Data from 954 papers have been added to the UniProtKB, Gene Ontology, and the International Molecular Exchange Consortium (IMEx) databases, with 299 human proteins and 279 orthologs updated in UniProtKB. 745 binary interactions were added to the IMEx human molecular interaction dataset. Conclusion: This represents a significant enhancement in the expert curated data pertinent to Alzheimer’s disease available in a number of biomedical databases. Relevant protein entries have been updated in UniProtKB and concomitantly in the Gene Ontology. Molecular interaction networks have been significantly extended in the IMEx Consortium dataset and a set of reference protein complexes created. All the resources described are open-source and freely available to the research community and we provide examples of how these data could be exploited by researchers.
Rationale : Endothelial function and dysfunction are central to the focal origin and regional development of atherosclerosis; however, an in vivo endothelial phenotypic footprint of susceptibility to atherosclerosis preceding pathological change remains elusive. Objective : To conduct a comparative multi-site genomics study of arterial endothelial phenotype in atherosusceptible and atheroprotected regions. Methods and Results : Transcript profiles of freshly isolated endothelial cells from 7 discrete arterial regions in normal swine were analyzed to determine the steady state in vivo endothelial phenotypes in regions of varying susceptibilities to atherosclerosis. The most abundant common feature of the endothelium of all atherosusceptible regions was the upregulation of genes associated with endoplasmic reticulum (ER) stress. The unfolded protein response pathway, induced by ER stress, was therefore investigated in detail in endothelium of the atherosusceptible aortic arch and was found to be partially activated. ER transmembrane signal transducers IRE1α and ATF6α and their downstream effectors, but not PERK, were activated concomitant with a higher transcript expression of protein folding enzymes and chaperones, indicative of ER stress in vivo. Conclusions : The findings demonstrate the prevalence of chronic endothelial ER stress and activated unfolded protein response in vivo at atherosusceptible arterial sites. We propose that chronic localized biological stress is linked to spatial susceptibility of the endothelium to the initiation of atherosclerosis.
In the arterial circulation, regions of disturbed flow (DF), which are characterized by flow separation and transient vortices, are susceptible to atherogenesis, whereas regions of undisturbed laminar flow (UF) appear protected. Coordinated regulation of gene expression by endothelial cells (EC) may result in differing regional phenotypes that either favor or inhibit atherogenesis. Linearly amplified RNA from freshly isolated EC of DF (inner aortic arch) and UF (descending thoracic aorta) regions of normal adult pigs was used to profile differential gene expression reflecting the steady state in vivo . By using human cDNA arrays, ≈2,000 putatively differentially expressed genes were identified through false-discovery-rate statistical methods. A sampling of these genes was validated by quantitative real-time PCR and/or immunostaining en face . Biological pathway analysis revealed that in DF there was up-regulation of several broad-acting inflammatory cytokines and receptors, in addition to elements of the NF-κB system, which is consistent with a proinflammatory phenotype. However, the NF-κB complex was predominantly cytoplasmic (inactive) in both regions, and no significant differences were observed in the expression of key adhesion molecules for inflammatory cells associated with early atherogenesis. Furthermore, there was no histological evidence of inflammation. Protective profiles were observed in DF regions, notably an enhanced antioxidative gene expression. This study provides a public database of regional EC gene expression in a normal animal, implicates hemodynamics as a contributory mechanism to athero-susceptibility, and reveals the coexistence of pro- and antiatherosclerotic transcript profiles in susceptible regions. The introduction of additional risk factors may shift this balance to favor lesion development.