De novo Clustering Nanopore Long Reads of Transcriptomics Data by Gene

Camille Marchet,Lolita Lecompte,Corinne Da Silva,Corinne Cruaud,Jean-Marc Aury,Jacques Nicolas,Pierre Peterlongo

De novo Clustering Nanopore Long Reads of Transcriptomics Data by Gene

2017

Camille Marchet
Lolita Lecompte
Corinne Da Silva
Corinne Cruaud
Jean-Marc Aury
Jacques Nicolas
Pierre Peterlongo

This work addresses the problem of assigning a set of long reads issued from a de novo transcriptomics study to clusters by genes they originate from. The different transcripts of a gene give long reads sharing similar sequences and our work makes use of this fact to retrieve the right cluster of reads for each gene from the graph of similarity between reads. We propose a method based on the use of the clustering coefficient (CC) and the search of a minimal cut in the graph with a greedy procedure favoring nodes with a high degree and high CC. Our approach compares favorably to state of the art methods. We provide results on the mouse brain transcriptome which show that the approach achieves a high precision level and a good level of recall despite not using any reference genome.

Keywords:

Genetics
Clustering coefficient
Cluster analysis
Reference genome
Transcriptome
Gene
Biology
Bioinformatics
Graph

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations