Abstract The Alliance of Genome Resources (Alliance) is an extensible coalition of knowledgebases focused on the genetics and genomics of intensively-studied model organisms. The Alliance is organized as individual knowledge centers with strong connections to their research communities and a centralized software infrastructure, discussed here. Model organisms currently represented in the Alliance are budding yeast, C. elegans , Drosophila , zebrafish, frog, laboratory mouse, laboratory rat, and the Gene Ontology Consortium. The project is in a rapid development phase to harmonize knowledge, store it, analyze it, and present it to the community through a web portal, direct downloads, and APIs. Here we focus on developments over the last two years. Specifically, we added and enhanced tools for browsing the genome (JBrowse), downloading sequences, mining complex data (AllianceMine), visualizing pathways, full-text searching of the literature (Textpresso), and sequence similarity searching (SequenceServer). We enhanced existing interactive data tables and added an interactive table of paralogs to complement our representation of orthology. To support individual model organism communities, we implemented species-specific “landing pages” and will add disease-specific portals soon; in addition, we support a common community forum implemented in Discourse. We describe our progress towards a central persistent database to support curation, the data modeling that underpins harmonization, and progress towards a state-of-the art literature curation system with integrated Artificial Intelligence and Machine Learning (AI/ML).
Abstract The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.
WormBase (http://www.wormbase.org) is an important knowledge resource for biomedical researchers worldwide. To accommodate the ever increasing amount and complexity of research data, WormBase continues to advance its practices on data acquisition, curation and retrieval to most effectively deliver comprehensive knowledge about Caenorhabditis elegans, and genomic information about other nematodes and parasitic flatworms. Recent notable enhancements include user-directed submission of data, such as micropublication; genomic data curation and presentation, including additional genomes and JBrowse, respectively; new query tools, such as SimpleMine, Gene Enrichment Analysis; new data displays, such as the Person Lineage browser and the Summary of Ontology-based Annotations. Anticipating more rapid data growth ahead, WormBase continues the process of migrating to a cutting-edge database technology to achieve better stability, scalability, reproducibility and a faster response time. To better serve the broader research community, WormBase, with five other Model Organism Databases and The Gene Ontology project, have begun to collaborate formally as the Alliance of Genome Resources.
Abstract The Alliance of Genome Resources (Alliance) is an extensible coalition of knowledgebases focused on the genetics and genomics of intensively studied model organisms. The Alliance is organized as individual knowledge centers with strong connections to their research communities and a centralized software infrastructure, discussed here. Model organisms currently represented in the Alliance are budding yeast, Caenorhabditis elegans, Drosophila, zebrafish, frog, laboratory mouse, laboratory rat, and the Gene Ontology Consortium. The project is in a rapid development phase to harmonize knowledge, store it, analyze it, and present it to the community through a web portal, direct downloads, and application programming interfaces (APIs). Here, we focus on developments over the last 2 years. Specifically, we added and enhanced tools for browsing the genome (JBrowse), downloading sequences, mining complex data (AllianceMine), visualizing pathways, full-text searching of the literature (Textpresso), and sequence similarity searching (SequenceServer). We enhanced existing interactive data tables and added an interactive table of paralogs to complement our representation of orthology. To support individual model organism communities, we implemented species-specific “landing pages” and will add disease-specific portals soon; in addition, we support a common community forum implemented in Discourse software. We describe our progress toward a central persistent database to support curation, the data modeling that underpins harmonization, and progress toward a state-of-the-art literature curation system with integrated artificial intelligence and machine learning (AI/ML).
The vitamin A metabolite retinoic acid (RA) is utilized as a signalling molecule in wide variety of developmental processes, defined by defects which occur after nutritional vitamin A deficiency or after exposure to excess vitamin A. We have initiated a genetic analysis of RA function through the establishment of lines of mice which carry germline mutations in the genes which encode retinoid receptors. Defects which result from developmental RA deficiency or excess have been recovered in embryos which are deficient in various combinations of retinoid receptors. In this chapter, our current understanding of the role of RA and retinoid receptors in cardiovascular and limb development are described, as for these our level of understanding is most advanced.
Abstract The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO—a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations—evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)—mechanistic models of molecular “pathways” (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.
The Drosophila melanogaster genes Passover and l(1)ogre and the Caenorhabditis elegans gene unc-7 define a gene family whose function is not known. We have isolated and characterized the C. elegans gene eat-5, which is required for synchronized pharyngeal muscle contractions, and find that it is a new member of this family. Simultaneous electrical and video recordings reveal that in eat-5 mutants, action potentials of muscles in the anterior and posterior pharynx are unsynchronized. Injection of carboxyfluorescein into muscles of the posterior pharynx demonstrates that all pharyngeal muscles are dye-coupled in wild-type animals; in eat-5 mutants, however, muscles of the anterior pharynx are no longer dye-coupled to posterior pharyngeal muscles. We show that a gene fusion of eat-5 to the green fluorescent protein is expressed in pharyngeal muscles. unc-7 and eat-5 are two of at least sixteen members of this family in C. elegans as determined by database searches and PCR-based screens. The amino acid sequences of five of these members in C. elegans have been deduced from cDNA sequences. Polypeptides of the family are predicted to have four transmembrane domains with cytoplasmic amino and carboxyl termini. We have constructed fusions of one of these polypeptides with beta-galactosidase and with green fluorescent protein. The fusion proteins appear to be localized in a punctate pattern at or near plasma membranes. We speculate that this gene family is required for the formation of gap junctions.
The Caenorhabditis elegans gene eat-4 affects multiple glutamatergic neurotransmission pathways. We find that eat-4 encodes a protein similar in sequence to a mammalian brain-specific sodium-dependent inorganic phosphate cotransporter I (BNPI). Like BNPI in the rat CNS, eat-4 is expressed predominantly in a specific subset of neurons, including several proposed to be glutamatergic. Loss-of-function mutations in eat-4 cause defective glutamatergic chemical transmission but appear to have little effect on other functions of neurons. Our data suggest that phosphate ions imported into glutamatergic neurons through transporters such as EAT-4 and BNPI are required specifically for glutamatergic neurotransmission.
We are endowed with a rich knowledge about Caenorhabditis elegans. Its stereotyped anatomy and development has stimulated research and resulted in the accumulation of cell-based information concerning gene expression, and the role of specific cells in developmental signalling and behavioural circuits. To make the information more accessible to sophisticated queries and automated retrieval systems, WormBase has begun to construct a C. elegans cell and anatomy ontology. Here we present our strategies and progress.