De novo Clustering of Gene Expressed Variants in Transcriptomic Long Reads Data Sets

Camille Marchet,Lolita Lecompte,Corinne Da Silva,Corinne Cruaud,Jean-Marc Aury,Jacques Nicolas,Pierre Peterlongo

De novo Clustering of Gene Expressed Variants in Transcriptomic Long Reads Data Sets

2017

This work addresses the problem of grouping by genes long reads expressed in a whole transcriptome sequencing data set. Long read sequencing produces several thousands base-pair long sequences, although showing high error rate in comparison to short reads. Long reads can cover full-length RNA transcripts and thus are of high interest to complete references. However, the literature is lacking tools to cluster such data de novo, in particular for Oxford Nanopore Technologies reads. As a consequence, we propose a novel algorithm based on community detection and its implementation. Since solution is meant to be reference-free (de novo), it is especially well-tailored for non model species. We demonstrate it performs well on a real mouse data set. When a reference is available, we show that it stands as an alternative to mapping. In addition, we show that quick assessment of gene's expression is a straightforward use case of our solution.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations