logo
    A practical guide to buildde-novoassemblies for single tissues of non-model organisms: the example of a Neotropical frog
    10
    Citation
    77
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Whole genome sequencing (WGS) is a very valuable resource to understand the evolutionary history of poorly known species. However, in organisms with large genomes, as most amphibians, WGS is still excessively challenging and transcriptome sequencing (RNA-seq) represents a cost-effective tool to explore genome-wide variability. Non-model organisms do not usually have a reference genome and the transcriptome must be assembled de-novo. We used RNA-seq to obtain the transcriptomic profile for Oreobates cruralis, a poorly known South American direct-developing frog. In total, 550,871 transcripts were assembled, corresponding to 422,999 putative genes. Of those, we identified 23,500, 37,349, 38,120 and 45,885 genes present in the Pfam, EggNOG, KEGG and GO databases, respectively. Interestingly, our results suggested that genes related to immune system and defense mechanisms are abundant in the transcriptome of O. cruralis. We also present a pipeline to assist with pre-processing, assembling, evaluating and functionally annotating a de-novo transcriptome from RNA-seq data of non-model organisms. Our pipeline guides the inexperienced user in an intuitive way through all the necessary steps to build de-novo transcriptome assemblies using readily available software and is freely available at: https://github.com/biomendi/TRANSCRIPTOME-ASSEMBLY-PIPELINE/wiki.
    Keywords:
    Sequence assembly
    RNA-Seq
    KEGG
    Model Organism
    De novo transcriptome assembly is an important stage of RNA-seq data computational analysis. It allows the researchers to obtain the sequences of transcripts presented in the biological sample of interest. The availability of accurate and complete transcriptome sequence of the organism of interest is, in turn, an indispensable condition for further analysis of RNA-seq data. Through years of transcriptomic research, the bioinformatics community has developed a number of assembler programs for transcriptome reconstruction from short reads of RNA-seq libraries. Different assemblers makes it possible to conduct a de novo transcriptome reconstruction and a genome-guided reconstruction. The majority of the assemblers working with RNA-seq data are based on the De Bruijn graph method of sequence reconstruction. However, specif ics of their procedures can vary drastically, as do their results. A number of authors recommend a hybrid approach to transcriptome reconstruction based on combining the results of several assemblers in order to achieve a better transcriptome assembly. The advantage of this approach has been demonstrated in a number of studies, with RNA-seq experiments conducted on the Illumina platform. In this paper, we propose a hybrid approach for creating a transcriptome assembly of the barley Hordeum vulgare isogenic line Bowman and two nearly isogenic lines contrasting in spike pigmentation, based on the results of sequencing on the IonTorrent platform. This approach implements several de novo assemblers: Trinity, Trans-ABySS and rnaSPAdes. Several assembly metrics were examined: the percentage of reference transcripts observed in the assemblies, the percentage of RNA-seq reads involved, and BUSCO scores. It was shown that, based on the summation of these metrics, transcriptome meta-assembly surpasses individual transcriptome assemblies it consists of.
    Sequence assembly
    RNA-Seq
    De Bruijn graph
    Citations (1)
    Cobia (Rachycentron canadum) is a marine teleost species with great productive potential worldwide. However, the genomic information currently available for this species in public databases is limited. This lack of information hinders gene expression assessments, which could bring forward novel insights into the physiology, ecology, evolution, and genetics of this aquaculture species. In this study, we report the first de novo transcriptome assembly of cobia liver to improve the availability of gene sequences from this important commercial fish. Thus, Illumina sequencing of liver transcripts generated 1,761,965,794 raw reads, which were filtered into 1,652,319,304 high-quality reads. De novo assembly resulted in 101,789 unigenes and 163,096 isoforms, with an average length of 950.61 and 1617.34 nt, respectively. Comparisons against six different databases resulted in functional annotation for 125,993 of these elements (77.3%), providing relevant information regarding the genomic content of R. canadum. We trust that the availability of these functionally annotated cobia elements will likely assist future nutrigenomics and breeding programs, involving this important fish farming species
    Sequence assembly
    Nutrigenomics
    Citations (0)
    De novo transcriptome assembly is an important stage of RNA-seq data computational analysis. It allows the researchers to obtain the sequences of transcripts presented in the biological sample of interest. The availability of accurate and complete transcriptome sequence of the organism of interest is, in turn, an indispensable condition for further analysis of RNA-seq data. Through years of transcriptomic research, the bioinformatics community has developed a number of assembler programs for transcriptome reconstruction from short reads of RNA-seq libraries. Different assemblers makes it possible to conduct a de novo transcriptome reconstruction and a genome-guided reconstruction. The majority of the assemblers working with RNA-seq data are based on the De Bruijn graph method of sequence reconstruction. However, specif ics of their procedures can vary drastically, as do their results. A number of authors recommend a hybrid approach to transcriptome reconstruction based on combining the results of several assemblers in order to achieve a better transcriptome assembly. The advantage of this approach has been demonstrated in a number of studies, with RNA-seq experiments conducted on the Illumina platform. In this paper, we propose a hybrid approach for creating a transcriptome assembly of the barley Hordeum vulgare isogenic line Bowman and two nearly isogenic lines contrasting in spike pigmentation, based on the results of sequencing on the IonTorrent platform. This approach implements several de novo assemblers: Trinity, Trans-ABySS and rnaSPAdes. Several assembly metrics were examined: the percentage of reference transcripts observed in the assemblies, the percentage of RNA-seq reads involved, and BUSCO scores. It was shown that, based on the summation of these metrics, transcriptome meta-assembly surpasses individual transcriptome assemblies it consists of.Реконструкция транскриптома de novo – важная стадия биоинформатического анализа данных RNA- seq, которая позволяет получить последовательности транскриптов, присутствующих в изучаемом биоло- гическом образце. Наличие точной и полной последовательности транскриптома организма, в свою очередь, является необходимым условием для дальнейшей работы с данными RNA-seq. Биоинформатическим сообще- ством было создано множество программ-сборщиков для реконструкции транскриптома из коротких прочте- ний RNA-seq. Сборщики позволяют проводить как de novo реконструкцию транскриптома, так и реконструкцию, основанную на картировании коротких прочтений RNA-seq на последовательность референсного генома орга- низма. Большинство de novo сборщиков, работающих с данными RNA-seq, применяют технологию реконструк- ции последовательностей методом графов де Брёйна. Однако детали их работы могут существенно различаться, поэтому различия могут встречаться и в результатах. Некоторые авторы рекомендуют для получения более пол- ной и качественной сборки использовать гибридную сборку транскриптома – подход, основанный на комби- нации результатов работы нескольких сборщиков. Преимущество такого подхода было продемонстрировано в ряде исследований по анализу транскриптомов на платформе Illumina. Нами предложен гибридный подход по созданию сборок транскриптома ячменя Hordeum vulgare изогенной линии Bowman и двух почти изогенных линий, полученных на основе Bowman и контрастных по окраске колоса, используя данные, полученные при секвенировании матричной РНК на платформе IonTorrent. В данном подходе применяются несколько индиви- дуальных сборщиков: Trans-ABySS, rnaSPAdes и Trinity. Были оценены некоторые показатели, характеризующие полноту и точность сборки: доля обнаруженных в сборке известных транскриптов ячменя, доля задействован- ных в сборке прочтений из библиотек RNA-seq, значение критерия BUSCO. По совокупности этих показателей метасборки демонстрируют более высокое качество полученного транскриптома по сравнению с индивидуаль- ными сборщиками.
    Sequence assembly
    RNA-Seq
    De Bruijn graph
    Citations (1)
    The use of RNA sequencing (RNA-Seq) data and the generation of de novo transcriptome assemblies have been pivotal for studies in ecology and evolution. This is especially true for nonmodel organisms, where no genome information is available. In such organisms, studies of differential gene expression, DNA enrichment bait design and phylogenetics can all be accomplished with de novo transcriptome assemblies. Multiple tools are available for transcriptome assembly, but no single tool can provide the best assembly for all data sets. Therefore, a multi-assembler approach, followed by a reduction step, is often sought to generate an improved representation of the assembly. To reduce errors in these complex analyses while at the same time attaining reproducibility and scalability, automated workflows have been essential in the analysis of RNA-Seq data. However, most of these tools are designed for species where genome data are used as reference for the assembly process, limiting their use in nonmodel organisms. We present TransPi, a comprehensive pipeline for de novo transcriptome assembly, with minimum user input but without losing the ability of a thorough analysis. A combination of different model organisms, k-mer sets, read lengths and read quantities was used for assessing the tool. Furthermore, a total of 49 nonmodel organisms, spanning different phyla, were also analysed. Compared to approaches using single assemblers only, TransPi produces higher BUSCO completeness percentages, and a concurrent significant reduction in duplication rates. TransPi is easy to configure and can be deployed seamlessly using Conda, Docker and Singularity.
    Sequence assembly
    RNA-Seq
    Citations (29)
    Advent of Next Generation Sequencing has led to possibilities of de novo transcriptome assembly of organisms without availability of complete genome sequence. Among various sequencing platforms available, Illumina is the most widely used platform based on data quality, quantity and cost. Various de novo transcriptome assemblers are also available today for construction of de novo transcriptome. In this study, we aimed at obtaining an ameliorated de novo transcriptome assembly with sequence reads obtained from Illumina platform and assembled using Trinity Assembler. We found that, primary transcriptome assembly obtained as a result of Trinity can be ameliorated on the basis of transcript length, coverage, and depth and protein homology. Our approach to ameliorate is reproducible and could enhance the sensitivity and specificity of the assembled transcriptome which could be critical for validation of the assembled transcripts and for planning various downstream biological assays.
    Sequence assembly
    Illumina dye sequencing
    Hybrid genome assembly
    Citations (48)
    De novo transcriptome assembly is an important approach in RNA-Seq data analysis and it can help us to reconstruct the transcriptome and investigate gene expression profiles without reference genome sequences. We carried out transcriptome assemblies with two RNA-Seq datasets generated from human brain and cell line, respectively. We then determined an efficient way to yield an optimal overall assembly using three different strategies. We first assembled brain and cell line transcriptome using a single k-mer length. Next we tested a range of values of k-mer length and coverage cutoff in assembling. Lastly, we combined the assembled contigs from a range of k values to generate a final assembly. By comparing these assembly results, we found that using only one k-mer value for assembly is not enough to generate good assembly results, but combining the contigs from different k-mer values could yield longer contigs and greatly improve the overall assembly.
    Sequence assembly
    RNA-Seq
    Citations (17)
    Background The sequencing, de novo assembly and annotation of transcriptome datasets generated with next generation sequencing (NGS) has enabled biologists to answer genomic questions in non-model species with unprecedented ease. Reliable and accurate de novo assembly and annotation of transcriptomes, however, is a critically important step for transcriptome assemblies generated from short read sequences. Typical benchmarks for assembly and annotation reliability have been performed with model species. To address the reliability and accuracy of de novo transcriptome assembly in non-model species, we generated an RNAseq dataset for an intertidal gastropod mollusc species, Nerita melanotragus, and compared the assembly produced by four different de novo transcriptome assemblers; Velvet, Oases, Geneious and Trinity, for a number of quality metrics and redundancy. Results Transcriptome sequencing on the Ion Torrent PGM™ produced 1,883,624 raw reads with a mean length of 133 base pairs (bp). Both the Trinity and Oases de novo assemblers produced the best assemblies based on all quality metrics including fewer contigs, increased N50 and average contig length and contigs of greater length. Overall the BLAST and annotation success of our assemblies was not high with only 15-19% of contigs assigned a putative function. Conclusions We believe that any improvement in annotation success of gastropod species will require more gastropod genome sequences, but in particular an increase in mollusc protein sequences in public databases. Overall, this paper demonstrates that reliable and accurate de novo transcriptome assemblies can be generated from short read sequencers with the right assembly algorithms. Keywords: Nerita melanotragus; De novo assembly; Transcriptome; Heat shock protein; Ion torrent
    Sequence assembly
    Ion semiconductor sequencing
    Citations (1)
    ABSTRACT: Cobia (Rachycentron canadum) is a marine teleost species with great productive potential worldwide. However, the genomic information currently available for this species in public databases is limited. Such lack of information hinders gene expression assessments that might bring forward novel insights into the physiology, ecology, evolution, and genetics of this potential aquaculture species. In this study, we report the first de novo transcriptome assembly of R. canadum liver, improving the availability of novel gene sequences for this species. Illumina sequencing of liver transcripts generated 1,761,965,794 raw reads, which were filtered into 1,652,319,304 high-quality reads. De novo assembly resulted in 101,789 unigenes and 163,096 isoforms, with an average length of 950.61 and 1,617.34 nt, respectively. Moreover, we found that 126,013 of these transcripts bear potentially coding sequences, and 125,993 of these elements (77.3%) correspond to functionally annotated genes found in six different databases. We also identified 701 putative ncRNA and 35,414 putative lncRNA. Interestingly, homologues for 410 of these putative lncRNAs have already been observed in previous analyzes with Danio rerio, Lates calcarifer, Seriola lalandi dorsalis, Seriola dumerili or Echeneis naucrates. Finally, we identified 7,894 microsatellites related to cobia's putative lncRNAs. Thus, the information derived from the transcriptome assembly described herein will likely assist future nutrigenomics and breeding programs involving this important fish farming species.
    Sequence assembly
    Nutrigenomics
    Citations (0)
    Transcriptome analysis has important applications in many biological fields. However, assembling a transcriptome without a known reference remains a challenging task requiring algorithmic improvements. We present two methods for substantially improving transcriptome de novo assembly. The first method relies on the observation that the use of a single k-mer length by current de novo assemblers is suboptimal to assemble transcriptomes where the sequence coverage of transcripts is highly heterogeneous. We present the Multiple-k method in which various k-mer lengths are used for de novo transcriptome assembly. We demonstrate its good performance by assembling de novo a published next-generation transcriptome sequence data set of Aedes aegypti, using the existing genome to check the accuracy of our method. The second method relies on the use of a reference proteome to improve the de novo assembly. We developed the Scaffolding using Translation Mapping (STM) method that uses mapping against the closest available reference proteome for scaffolding contigs that map onto the same protein. In a controlled experiment using simulated data, we show that the STM method considerably improves the assembly, with few errors. We applied these two methods to assemble the transcriptome of the non-model catfish Loricaria gr. cataphracta. Using the Multiple-k and STM methods, the assembly increases in contiguity and in gene identification, showing that our methods clearly improve quality and can be widely used. The new methods were used to assemble successfully the transcripts of the core set of genes regulating tooth development in vertebrates, while classic de novo assembly failed.
    Sequence assembly
    Proteome
    Hybrid genome assembly
    Citations (342)