Integrate Heterogeneous NGS and TGS Data to Boost Genome-free Transcriptome Research

2020 
It is a long-term challenge to undertake reliable transcriptomic research under different circumstances of genome availability. Here, we newly developed a genome-free computational method to aid accurate transcriptome assembly, using the amphioxus as the example. Via integrating ten next generation sequencing (NGS) transcriptome datasets and one third-generation sequencing (TGS) dataset, we built a sequence library of non-redundant expressed transcripts for the amphioxus. The library consisted of overall 91,915 distinct transcripts, 51,549 protein-coding transcripts, and 16,923 novel extragenic transcripts. This substantially improved current amphioxus genome annotation by expanding the distinct gene number from 21,954 to 38,777. We consolidated the library significantly outperformed the genome, as well as de novo method, in transcriptome assembly from multiple aspects. For convenience, we curated the Integrative Transcript Library database of the amphioxus (http://www.bio-add.org/InTrans/). In summary, this work provides a practical solution for most organisms to alleviate the heavy dependence on good quality genome in transcriptome research. It also ensures the amphioxus transcriptome research grounding on reliable data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    43
    References
    1
    Citations
    NaN
    KQI
    []