De novo Clustering Nanopore Long Reads of Transcriptomics Data by Gene

2017 
This work addresses the problem of assigning a set of long reads issued from a de novo transcriptomics study to clusters by genes they originate from. The different transcripts of a gene give long reads sharing similar sequences and our work makes use of this fact to retrieve the right cluster of reads for each gene from the graph of similarity between reads. We propose a method based on the use of the clustering coefficient (CC) and the search of a minimal cut in the graph with a greedy procedure favoring nodes with a high degree and high CC. Our approach compares favorably to state of the art methods. We provide results on the mouse brain transcriptome which show that the approach achieves a high precision level and a good level of recall despite not using any reference genome.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []