The Bioinformatics platform includes a Laboratory Integrated Management System (LIMS), the implementation of wEMBOSS, home-developed perl tools for data analysis, InterproScan for annotation of sequence domains, and the implementation of wBLAST and wNetBLAST among other tools available. The main backbone of the system is an adaptation of the SOL Genomics Network (SGN) databases developed at Cornell University for ESTs, molecular markers and BAC sequences storage and analysis (http://sgn.cornell.edu). The system is based on the postgresQL relational database, the use of perl scripts for the manipulation of data, the Apache Web server with the mod_perl integrated perl interpreter, and the servers run the Debian distribution of the GNU/Linux operating system.
The coffee berry borer (CBB) is the most prevalent pest of coffee plantations. Within the Coffea genus, C. arabica is susceptible to CBB and C. liberica shows a lower susceptibility. Two EST libraries were constructed from the total RNA of C. arabica and C. liberica fruits artificially infested with CBBs for 24 h. Using 6000 clones sequenced per library, a unigene database was generated, obtaining 3634 singletons and 1454 contigs. For each contig, the proportion of sequences present in both species was determined and a differential gene expression between the species was detected. C. arabica displayed a higher relative expression of proteins involved in general stress responses, whereas C. liberica showed the induction of a higher number of insect defense proteins. In order to validate the results, quantifications through real-time PCR were done. A hevein-like protein, an isoprene synthase, a salicylic acid carboxyl methyltransferase and a patatin-like protein gene were highly upregulated in C. liberica at 24 and/or 48 h after insect infestation compared to C. arabica. The identification of metabolic pathways induced by this pest insect provides tools to take advantage of the genetic resources available for the control of CBB.
Understanding and exploiting genetic diversity is a key factor for the productive and stable production of rice. Here, we utilize 73 high-quality genomes that encompass the subpopulation structure of Asian rice (Oryza sativa), plus the genomes of two wild relatives (O. rufipogon and O. punctata), to build a pan-genome inversion index of 1769 non-redundant inversions that span an average of ~29% of the O. sativa cv. Nipponbare reference genome sequence. Using this index, we estimate an inversion rate of ~700 inversions per million years in Asian rice, which is 16 to 50 times higher than previously estimated for plants. Detailed analyses of these inversions show evidence of their effects on gene expression, recombination rate, and linkage disequilibrium. Our study uncovers the prevalence and scale of large inversions (≥100 bp) across the pan-genome of Asian rice and hints at their largely unexplored role in functional biology and crop performance.
In analyzing gene families in the whole-genome sequences available for O. sativa (AA), O. glaberrima (AA), and O. brachyantha (FF), we observed large size expansions in the AA genomes compared to FF genomes for the super-families F-box and NB-ARC, and five additional families: the Aspartic proteases, BTB/POZ proteins (BTB), Glutaredoxins, Trypsin α-amylase inhibitor proteins, and Zf-Dof proteins. Their evolutionary dynamic was investigated to understand how and why such important size variations are observed between these closely related species. We show that expansions resulted from both amplification, largely by tandem duplications, and contraction by gene losses. For the F-box and NB-ARC gene families, the genes conserved in all species were under strong purifying selection while expanded orthologous genes were under more relaxed purifying selection. In F-box, NB-ARC, and BTB, the expanded groups were enriched in genes with little evidence of expression, in comparison with conserved groups. We also detected 87 loci under positive selection in the expanded groups. These results show that most of the duplicated copies in the expanded groups evolve neutrally after duplication because of functional redundancy but a fraction of these genes were preserved following neofunctionalization. Hence, the lineage-specific expansions observed between Oryza species were partly driven by directional selection.
Coffee leaf rust caused by the fungus Hemileia vastatrix is the most damaging disease to coffee worldwide. The pathogen has recently appeared in multiple outbreaks in coffee producing countries resulting in significant yield losses and increases in costs related to its control. New races/isolates are constantly emerging as evidenced by the presence of the fungus in plants that were previously resistant. Genomic studies are opening new avenues for the study of the evolution of pathogens, the detailed description of plant-pathogen interactions and the development of molecular techniques for the identification of individual isolates. For this purpose we sequenced 8 different H. vastatrix isolates using NGS technologies and gathered partial genome assemblies due to the large repetitive content in the coffee rust hybrid genome; 74.4% of the assembled contigs harbor repetitive sequences. A hybrid assembly of 333 Mb was built based on the 8 isolates; this assembly was used for subsequent analyses. Analysis of the conserved gene space showed that the hybrid H. vastatrix genome, though highly fragmented, had a satisfactory level of completion with 91.94% of core protein-coding orthologous genes present. RNA-Seq from urediniospores was used to guide the de novo annotation of the H. vastatrix gene complement. In total, 14,445 genes organized in 3921 families were uncovered; a considerable proportion of the predicted proteins (73.8%) were homologous to other Pucciniales species genomes. Several gene families related to the fungal lifestyle were identified, particularly 483 predicted secreted proteins that represent candidate effector genes and will provide interesting hints to decipher virulence in the coffee rust fungus. The genome sequence of Hva will serve as a template to understand the molecular mechanisms used by this fungus to attack the coffee plant, to study the diversity of this species and for the development of molecular markers to distinguish races/isolates.
This file includes 1.3M gene-related novel SNPs that is out of ~2.3 M genome-wide novel SNPs, which were identified relative to the IRGSP RefSeq using analysis of a 16-genome rice reference panel.More details could be identified from the manuscript titled: "A high-performance computational workflow to accelerate GATK SNP detection across a 25-genome dataset." (https://www.biorxiv.org/content/biorxiv/early/2023/06/26/2023.06.25.546420.full.pdf)
Bread wheat (Triticum aestivum) is a globally dominant crop and major source of calories and proteins for the human diet. Compared with its wild ancestors, modern bread wheat shows lower genetic diversity, caused by polyploidisation, domestication and breeding bottlenecks
Abstract High-quality genome assemblies are characterized by high-sequence contiguity, completeness, and a low error rate, thus providing the basis for a wide array of studies focusing on natural species ecology, conservation, evolution, and population genomics. To provide this valuable resource for conservation projects and comparative genomics studies on gyrfalcon (Falco rusticolus), we sequenced and assembled the genome of this species using third-generation sequencing strategies and optical maps. Here, we describe a highly contiguous and complete genome assembly comprising 20 scaffolds and 13 contigs with a total size of 1.193 Gbp, including 8,064 complete Benchmarking Universal Single-Copy Orthologs (BUSCOs) of the total 8,338 BUSCO groups present in the library aves_odb10. Of these BUSCO genes, 96.7% were complete, 96.1% were present as a single copy, and 0.6% were duplicated. Furthermore, 0.8% of BUSCO genes were fragmented and 2.5% (210) were missing. A de novo search for transposable elements (TEs) identified 5,716 TEs that masked 7.61% of the F. rusticolus genome assembly when combined with publicly available TE collections. Long interspersed nuclear elements, in particular, the element Chicken-repeat 1 (CR1), were the most abundant TEs in the F. rusticolus genome. A de novo first-pass gene annotation was performed using 293,349 PacBio Iso-Seq transcripts and 496,195 transcripts derived from the assembly of 42,429,525 Illumina PE RNA-seq reads. In all, 19,602 putative genes, of which 59.31% were functionally characterized and associated with Gene Ontology terms, were annotated. A comparison of the gyrfalcon genome assembly with the publicly available assemblies of the domestic chicken (Gallus gallus), zebra finch (Taeniopygia guttata), and hummingbird (Calypte anna) revealed several genome rearrangements. In particular, nine putative chromosome fusions were identified in the gyrfalcon genome assembly compared with those in the G. gallus genome assembly. This genome assembly, its annotation for TEs and genes, and the comparative analyses presented, complement and strength the base of high-quality genome assemblies and associated resources available for comparative studies focusing on the evolution, ecology, and conservation of Aves.