ReorientExpress: reference-free orientation of nanopore cDNA reads with deep learning

2019 
Long-read sequencing technologies allow the systematic interrogation of transcriptomes from any species. However, functional characterization requires the determination of the correct orientation of reads. Oxford Nanopore Technologies (ONT) allows the direct measurement of RNA molecules in the native orientation, but sequencing of complementary-DNA (cDNA) libraries generally yields a larger number of reads. Although strand-specific adapters can be used, error rates hinder their detection. Current methods rely on the comparison to a genome reference or on the use of additional technologies, which limits the applicability of rapid and cost-effective long-read sequencing for transcriptomics beyond model species. To facilitate the de-novo interrogation of transcriptomes in species or samples for which a genome reference is not available, we have developed ReorientExpress, a new tool based on deep learning to perform reference-free orientation of ONT reads from a cDNA library. Using as training transcriptome annotations, ReorientExpress predicted correctly the orientation of 84% of ONT cDNA reads in human, and 93% in S. cerevisiae. Furthermore, testing in human a model trained with mouse annotations, or testing in S. cerevisiae a model trained with C. glabrata, produced similar accuracy. Finally, in combination with long-read clustering, ReorientExpress established the right orientation for the majority of reads (92% in human, 97% in S. cerevisiae). ReorientExpress facilitates the interpretation of transcriptomes from long-read cDNA sequencing data without the need of a genome reference or the use of additional technologies. ReorientExpress is available at https://github.com/comprna/reorientexpress.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []