De novo Clustering of Gene Expressed Variants in Transcriptomic Long Reads Data Sets
2017
This work addresses the problem of grouping by genes long reads expressed in a whole transcriptome sequencing data set. Long read sequencing produces several thousands base-pair long sequences, although showing high error rate in comparison to short reads. Long reads can cover full-length RNA transcripts and thus are of high interest to complete references. However, the literature is lacking tools to cluster such data de novo, in particular for Oxford Nanopore Technologies reads. As a consequence, we propose a novel algorithm based on community detection and its implementation. Since solution is meant to be reference-free (de novo), it is especially well-tailored for non model species. We demonstrate it performs well on a real mouse data set. When a reference is available, we show that it stands as an alternative to mapping. In addition, we show that quick assessment of gene's expression is a straightforward use case of our solution.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
36
References
1
Citations
NaN
KQI