Grouper: graph-based clustering and annotation for improved de novo transcriptome analysis

2018 
Motivation:De novo transcriptome analysis using RNA-seq offers a promising means to study gene expression in non-model organisms. Yet, the difficulty of transcriptome assembly means that the contigs provided by the assembler often represent a fractured and incomplete view of the transcriptome, complicating downstream analysis. We introduce Grouper, a new method for clustering contigs from de novo assemblies that are likely to belong to the same transcripts and genes; these groups can subsequently be analyzed more robustly. When provided with access to the genome of a related organism, Grouper can transfer annotations to the de novo assembly, further improving the clustering. Results:On de novo assemblies from four different species, we show that Grouper is able to accurately cluster a larger number of contigs than the existing state-of-the-art method. The Grouper pipeline is able to map greater than 10% more reads against the contigs, leading to accurate downstream differential expression analyses. The labeling module, in the presence of a closely related annotated genome, can efficiently transfer annotations to the contigs and use this information to further improve clustering. Overall, Grouper provides a complete and efficient pipeline for processing de novo transcriptomic assemblies. Availability and implementation:The Grouper software is freely available at https://github.com/COMBINE-lab/grouper under the 2-clause BSD license. Supplementary information:Supplementary data are available at Bioinformatics online.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    10
    Citations
    NaN
    KQI
    []