Long-read cDNA Sequencing Enables a 'Gene-Like' Transcript Annotation of Transposable Elements.

2020 
Transcript-based annotations of genes facilitate both genome-wide analyses and detailed single locus research. In contrast, transposable element (TE) annotations are rudimentary, consisting of information only on TE location and type. The repetitiveness and limited annotation of TEs prevents the ability to distinguish between potentially functional expressed elements and degraded copies. To improve genome-wide TE bioinformatics, we performed long-read sequencing of cDNAs from Arabidopsis thaliana lines deficient in multiple layers of TE repression. These uniquely-mapping transcripts were used to identify the set of TEs able to generate polyadenylated RNAs and create a new transcript-based annotation of TEs that we have layered upon the existing high-quality community standard annotation. We used this annotation to reduce the bioinformatic complexity associated with multi-mapping reads from short-read RNA-seq experiments, and we show that this improvement is expanded in a TE-rich genome such as maize. Our TE annotation also enables the testing of specific standing hypotheses in the TE field. We demonstrate that inaccurate TE splicing does not trigger small RNA production, and the cell more strongly targets DNA methylation to TEs that have the potential to make mRNAs. This work provides a new transcript-based TE annotation for Arabidopsis and maize, which serves as a blueprint to reduce the bioinformatic complexity associated with repetitive TEs in any organism.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    47
    References
    19
    Citations
    NaN
    KQI
    []