Guidelines for the functional annotation of microRNAs using the Gene Ontology
Rachael P. HuntleyД. С. СитниковM Orlic-MilacicRama BalakrishnanPeter D’EustachioMarc GillespieDouglas G. HoweAnastasia Z. KaleaLars MäegdefesselDavid Osumi-SutherlandVictoria PetriJennifer R. SmithKimberly Van AukenValerie WoodAnna ZampetakiManuel MayrRuth C. Lovering
42
Citation
42
Reference
10
Related Paper
Citation Trend
Abstract:
MicroRNA regulation of developmental and cellular processes is a relatively new field of study, and the available research data have not been organized to enable its inclusion in pathway and network analysis tools. The association of gene products with terms from the Gene Ontology is an effective method to analyze functional data, but until recently there has been no substantial effort dedicated to applying Gene Ontology terms to microRNAs. Consequently, when performing functional analysis of microRNA data sets, researchers have had to rely instead on the functional annotations associated with the genes encoding microRNA targets. In consultation with experts in the field of microRNA research, we have created comprehensive recommendations for the Gene Ontology curation of microRNAs. This curation manual will enable provision of a high-quality, reliable set of functional annotations for the advancement of microRNA research. Here we describe the key aspects of the work, including development of the Gene Ontology to represent this data, standards for describing the data, and guidelines to support curators making these annotations. The full microRNA curation guidelines are available on the GO Consortium wiki (http://wiki.geneontology.org/index.php/MicroRNA_GO_annotation_manual).Keywords:
Gene Annotation
The functional annotation of gene lists is a common analysis routine required for most genomics experiments, and bioinformatics core facilities must support these analyses. In contrast to methods such as the quantitation of RNA-Seq reads or differential expression analysis, our research group noted a lack of consensus in our preferred approaches to functional annotation. To investigate this observation, we selected 4 experiments that represent a range of experimental designs encountered by our cores and analyzed those data with 6 tools used by members of the Association of Biomolecular Resource Facilities (ABRF) Genomic Bioinformatics Research Group (GBIRG). To facilitate comparisons between tools, we focused on a single biological result for each experiment. These results were represented by a gene set, and we analyzed these gene sets with each tool considered in our study to map the result to the annotation categories presented by each tool. In most cases, each tool produces data that would facilitate identification of the selected biological result for each experiment. For the exceptions, Fisher’s exact test parameters could be adjusted to detect the result. Because Fisher’s exact test is used by many functional annotation tools, we investigated input parameters and demonstrate that, while background set size is unlikely to have a significant impact on the results, the numbers of differentially expressed genes in an annotation category and the total number of differentially expressed genes under consideration are both critical parameters that may need to be modified during analyses. In addition, we note that differences in the annotation categories tested by each tool, as well as the composition of those categories, can have a significant impact on results.
Gene Annotation
Functional Genomics
Identification
Cite
Citations (0)
High throughput gene expression studies using oligonucleotide microarrays depend on the specificity of each oligonucleotide (oligo or probe) for its target gene. However, target specific probes can only be designed when a reference genome of the species at hand were completely sequenced, when this genome were completely annotated and when the genetic variation of the sampled individuals were completely known. Unfortunately there is not a single species for which such a complete data set is available. Therefore, it is important that probe annotation can be updated frequently for optimal interpretation of microarray experiments.In this paper we present OligoRAP, a pipeline to automatically update the annotation of oligo libraries and estimate oligo target specificity. OligoRAP uses a reference genome assembly with Ensembl and Entrez Gene annotation supplemented with a set of unmapped transcripts derived from RefSeq and UniGene to handle assembly gaps. OligoRAP produces alignments of each oligo with the reference assembly as well as with unmapped transcripts. These alignments are re-mapped to the annotation sources, which results in a concise, as complete as possible and up-to-date annotation of the oligo library. The building blocks of this pipeline are BioMoby web services creating a highly modular and distributed system with a robust, remote programmatic interface.OligoRAP was used to update the annotation for a subset of 791 oligos from the ARK-Genomics 20 K chicken array, which were selected as starting material for the oligo annotation session of the EADGENE/SABRE Post-analysis workshop. Based on the updated annotation about one third of these oligos is problematic with regard to target specificity. In addition, the accession numbers or ids the oligos were originally designed for no longer exist in the updated annotation for almost half of the oligos.As microarrays are designed on incomplete data, it is important to update probe annotation and check target specificity regularly. OligoRAP provides both and due to its design based on BioMoby web services it can easily be embedded as an oligo annotation engine in customised applications for microarray data analysis. The dramatic difference in updated annotation and target specificity for the ARK-Genomics 20 K chicken array as compared to the original data emphasises the need for regular updates.
Cite
Citations (15)
Sequence annotation is essential for genomics-based research. Investigators of a specific genomic region who have developed abundant local discoveries such as genes and genetic markers, or have collected annotations from multiple resources, can be overwhelmed by the difficulty in creating local annotation and the complexity of integrating all the annotations. Presenting such integrated data in a form suitable for data mining and high-throughput experimental design is even more daunting. DNannotator, a web application, was designed to perform batch annotation on a sizeable genomic region. It takes annotation source data, such as SNPs, genes, primers, and so on, prepared by the end-user and/or a specified target of genomic DNA, and performs de novo annotation. DNannotator can also robustly migrate existing annotations in GenBank format from one sequence to another. Annotation results are provided in GenBank format and in tab-delimited text, which can be imported and managed in a database or spreadsheet and combined with existing annotation as desired. Graphic viewers, such as Genome Browser or Artemis, can display the annotation results. Reference data (reports on the process) facilitating the user's evaluation of annotation quality are optionally provided. DNannotator can be accessed at http://sky.bsd.uchicago.edu/DNannotator.htm.
Gene Annotation
Cite
Citations (7)
Abstract Background Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1) the submission of gene annotation to an annotation project, 2) the review of the submitted models by project annotators, and 3) the incorporation of the submitted models in the ongoing annotation effort. Results We have developed the Eukaryotic Community Annotation Package (EuCAP), an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA) has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs) in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website http://rice.tigr.org , as well as in the Community Annotation track of the Genome Browser. Conclusion We have applied EuCAP to rice. As of July 2007, the structural and/or functional annotation of 1,094 genes representing 57 families have been deposited and integrated into the current gene set. All of the EuCAP components are open-source, thereby allowing the implementation of EuCAP for the annotation of other genomes. EuCAP is available at http://sourceforge.net/projects/eucap/ .
R package
Cite
Citations (7)
The ontological analysis of the gene lists obtained from DNA microarray experiments constitutes an important step in understanding the underlying biology of the analyzed system. Over the last years, many other high-throughput techniques emerged, covering now basically all 'omics' fields. However, for some of these techniques the generally used functional ontologies might not be sufficient to describe the biological system represented by the derived gene lists. For a more complete and correct interpretation of these experiments, it is important to extend substantially the number of annotations, adapting the ontological analysis to the new emerging techniques.We developed Annotation-Modules, which offers an improvement over the current tools in two critical aspects. First, the underlying annotation database implements features from many different fields like gene regulation and expression, sequence properties, evolution and conservation, genomic localization and functional categories-resulting in about 60 different annotation features. Second, it examines not only single annotations but also all the combinations, which is important to gain insight into the interplay of different mechanisms in the analyzed biological system.http://web.bioinformatics.cicbiogune.es/AM/AnnotationModules.php
Gene Annotation
Biological database
Gene nomenclature
Cite
Citations (34)
Hayai-Annotation Plants: an ultra-fast and comprehensive functional gene annotation system in plants
Hayai-Annotation Plants is a browser-based interface for an ultra-fast and accurate functional gene annotation system for plant species using R. The pipeline combines the sequence-similarity searches, using USEARCH against UniProtKB (taxonomy Embryophyta), with a functional annotation step. Hayai-Annotation Plants provides five layers of annotation: i) protein name; ii) gene ontology terms consisting of its three main domains (Biological Process, Molecular Function and Cellular Component); iii) enzyme commission number; iv) protein existence level; and v) evidence type. It implements a new algorithm that gives priority to protein existence level to propagate GO and EC information and annotated Arabidopsis thaliana representative peptide sequences (Araport11) within 5 min at the PC level.The software is implemented in R and runs on Macintosh and Linux systems. It is freely available at https://github.com/kdri-genomics/Hayai-Annotation-Plants under the GPLv3 license.Supplementary data are available at Bioinformatics online.
UniProt
Gene Annotation
Interface (matter)
Cite
Citations (17)
Gene Annotation
Cite
Citations (356)
Summary Hayai-Annotation Plants is a browser-based interface for an ultra-fast and accurate gene annotation system for plant species using R. The pipeline combines the sequence-similarity searches, using USEARCH against UniProtKB (taxonomy Embryophyta), with a functional annotation step. Hayai-Annotation Plants provides five layers of annotation: 1) gene name; 2) gene ontology terms consisting of its three main domains (Biological Process, Molecular Function, and Cellular Component); 3) enzyme commission number; 4) protein existence level; 5) and evidence type. In regard to speed and accuracy, Hayai-Annotation Plants annotated Arabidopsis thaliana (Araport11, representative peptide sequences) within five minutes with an accuracy of 96.4 %. Availability and Implementation The software is implemented in R and runs on Macintosh and Linux systems. It is freely available at https://github.com/kdri-genomics/Hayai-Annotation-Plants under the GPLv3 license.
UniProt
Gene Annotation
Cite
Citations (5)
The sequence of any genome becomes most useful for biological experimentation when a complete and accurate gene set is available. Gene prediction programs offer an efficient way to generate an automated gene set. Manual annotation, when performed by experienced annotators, is more accurate and complete than automated annotation. However, it is a laborious and expensive process, and by its nature, introduces a degree of variability not found with automated annotation. EAnnot (Electronic Annotation) is a program originally developed for manually annotating the human genome. It combines the latest bioinformatics tools to extract and analyze a wide range of publicly available data in order to achieve fast and reliable automatic gene prediction and annotation. EAnnot builds gene models based on mRNA, EST, and protein alignments to genomic sequence, attaches supporting evidence to the corresponding genes, identifies pseudogenes, and locates poly(A) sites and signals. Here, we compare manual annotation of human chromosome 6 with annotation performed by EAnnot in order to assess the latter's accuracy. EAnnot can readily be applied to manual annotation of other eukaryotic genomes and can be used to rapidly obtain an automated gene set.
Gene Annotation
Pseudogene
Gene prediction
Gene nomenclature
Cite
Citations (22)
Abstract The rapid growth of next-generation sequencing (NGS) technology has led to a surge in the determination of whole genome sequences in plants. This has created a need for functional annotation of newly predicted gene sequences in the assembled genomes. To address this, “Hayai-Annotation Plants” was developed as a gene functional annotation tool for plant species. In this report, we compared Hayai-Annotation Plants with Blast2GO and TRAPID, focusing on the three primary gene-ontology (GO) domains: Biological Process (BP), Molecular Function (MF), and Cellular Component (CC). Using the Arabidopsis thaliana GO annotation as a benchmark, we evaluated each tool using two approaches: the area under the precision-recall curve (AUC-PR) and the metrics used at the critical assessment of functional annotation (CAFA). In the latter case, a CAFA-evaluator, was used to determine the F-score, weighted F-score, and S-score for each domain. Hayai-Annotation Plants showed better performances in all three GO domains. Our results thus reaffirm the effectiveness of Hayai-Annotation Plants for functional gene annotation in plant species. In this era of extensive whole genome sequencing, Hayai-Annotation Plants will serve as a valuable tool that facilitates simplified and accurate gene function annotation for numerous users, thereby making a significant contribution to plant research.
Benchmarking
Gene Annotation
Benchmark (surveying)
Cite
Citations (1)