TA, GT and AC are significantly under-represented in open reading frames of prokaryotic and eukaryotic protein-coding genes

2019 
Genomes can be considered a combination of 16 dinucleotides. Analysing the relative abundance of different dinucleotides may reveal important features of genome evolution. In present study, we conducted extensive surveys on the relative abundances of dinucleotides in various genomic components of 28 bacterial, 20 archaean, 19 fungal, 24 plant and 29 animal species. We found that TA, GT and AC are significantly under-represented in open reading frames of all organisms and in intergenic regions and introns of most organisms. Specific dinucleotides are of greatly varied usage at different codon positions. The significantly low representations of TA, GT and AC are considered the evolutionary consequences of preventing formation of pre-mature stop codons and of reducing intron-splicing options in candidate primary mRNA sequences. These data suggest that a reduction of TA and GT occurred on both strands of the DNA sequence at an early stage of de novo gene birth. Interestingly, GT and AC are also significantly under-represented in current prokaryotic genomes, suggesting that ancient prokaryotic protein-coding genes might have contained introns. The greatly varied usages of specific dinucleotides at different codon positions are considered evolutionary accommodations to compensate the unavailability of specific codons and to avoid formation of pre-mature stop codons. This is the first report presenting data of dinucleotide relative abundance to indicate the possible existence of spliceosomal introns in ancient prokaryotic genes and to hypothesize early steps of de novo gene birth.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    3
    Citations
    NaN
    KQI
    []