Non-coding RNA genes in eukaryotes genomes : computational identification and evolution

2007 
It became clear that non-coding RNAs(ncRNA) participate in the control of gene expression at different levels of regulation. However, ncRNA genes are usually not annotated within genomes. Better understanding of genome functioning requires refined computational tools for ncRNA prediction, some are emerging in the nowadays genomic era. I developed a computational system, called snoRMP, to identify the box C/D snoRNAs that play a fundamental role in ribosome biogenesis. I applied it to the rice genome and identified 346 snoRNAs that grouped into 120 paralogous sets, sequence differences of which allowed to find clues about the mechanisms of duplication and evolution of snoRNAs. I also used the snoRMP to screen the genomes of Schizosaccharomyces pombe, Drosophila melanogaster and Chlamydomonas reinhardtii. In addition, I performed an extensive analysis of 415 rRNA and box C/D snoRNA complementary sequences involved in methylation of 124 rRNA sites from fungi, plants and animals. I could define snoRNA-rRNA duplex cores of 9 base pairs, over which single mutations had been severely counter-selected, and double compensatory mutations, retained. The Paramecium tetraurelia genome arose through at least three whole-genome duplications(WGD). In contrast with most genomes having evolved by WGDs that had lost a large fraction of the gene duplicates, the P. Tetraurelia genome had not. I used motif-based methods to recover extensive contents of P. Tetraurelia RNA genes, and analyzed their evolution in this specific WGD context. At last, I used a combination of comparative sequence analysis and structure predictions to analyze the whole amount of ncDNA and identify 137 ncRNA candidates.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []