Targeted Transcriptome Analysis using Synthetic Long Read Sequencing Uncovers Isoform Reprograming in the Progression of Colon Cancer

2020 
Diversity in human gene expression stems, to a large extent, from splicing exons into multiple mRNA isoforms. Characterization of isoforms requires accurate long-read sequencing. However, read lengths, high error rates, low throughput and large input requirements are some of the challenges that remain to be addressed in sequencing technologies. In this study, we used a barcoding-based synthetic long read (SLR) isoform sequencing approach, LoopSeq, to generate sequencing reads sufficiently long and accurate to identify isoforms using standard short read Illumina sequencers. The method identifies isoforms from control RNA samples with 99.4% accuracy and a 0.01% per-base error rate, exceeding the accuracy reported for other long-read sequencing technologies. Applied to targeted transcriptome sequencing of over 10,000 genes from colon cancers and their metastatic counterparts, LoopSeq revealed large scale isoform redistributions from benign colon mucosa to primary colon cancer and metastatic cancer and identified several novel gene fusion isoforms in the colon cancer samples. Strikingly, our data showed that most single nucleotide variants (SNVs) occurred dominantly in specific isoforms and that some SNVs underwent isoform switching in cancer progression. The ability to use short read sequencers to generate accurate long-read isoform information as the raw unit of transcriptional information holds promise as a new and widely accessible approach in RNA isoform analyses.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    56
    References
    1
    Citations
    NaN
    KQI
    []