Unexpected features of GC-AG introns in long non-coding and protein-coding genes suggest a new role as regulatory elements

2019 
Long non-coding (lnc) RNAs are today recognized as a new class of regulatory molecules despite very little is known about their functions in the cell. Due to their overall low level of expression and tissue-specificity, their identification and annotation in many genomes remains challenging. In this study, we exploited recent annotations provided by the GENCODE project to characterize the genomic and splicing features of lnc-genes in comparison to protein-coding (pc) ones, both in human and mouse. Our analysis highlighted slight differences between the two classes of genes in terms of genome organization and gene architecture. Significant differences in the splice sites usage were observed between lnc- and pc-genes. While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in pc-genes, we identified a remarkable enrichment of the GC-AG splice sites in lnc-genes, both in human (3.0%) and mouse (1.9%). In addition, we found a positional bias of GC-AG splice sites being enriched in the first intron in both classes of genes. Moreover, a significant shorter length and weaker splice sites were found comparing GC-AG introns with the canonical GT-AG introns. The computational analysis of GC-AG splice sites strength revealed a strong reduction in both the donor and the acceptor splice sites scores especially in lnc first intron in both species. Genes containing at least one GC-AG intron were found conserved in many species and a functional enrichment analysis pointed toward their enrichment in specific biological processes. Furthermore, as previously suggested, GC-AG-containing genes were shown to be more prone to alternative splicing. Taken together, our study suggested that GC-AG introns could represent new regulatory elements mainly associated with lnc-genes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    91
    References
    0
    Citations
    NaN
    KQI
    []