language-icon Old Web
English
Sign In

De novo transcriptome assembly

De novo transcriptome assembly is the de novo sequence assembly method of creating a transcriptome without the aid of a reference genome. De novo transcriptome assembly is the de novo sequence assembly method of creating a transcriptome without the aid of a reference genome. As a result of the development of novel sequencing technologies, the years between 2008 and 2012 saw a large drop in the cost of sequencing. Per megabase and genome, the cost dropped to 1/100,000th and 1/10,000th of the price, respectively. Prior to this, only transcriptomes of organisms that were of broad interest and utility to scientific research were sequenced; however, these developed in 2010s high-throughput sequencing (also called next-generation sequencing) technologies are both cost- and labor- effective, and the range of organisms studied via these methods is expanding. Transcriptomes have subsequently been created for chickpea, planarians, Parhyale hawaiensis, as well as the brains of the Nile crocodile, the corn snake, the bearded dragon, and the red-eared slider, to name just a few. Examining non-model organisms can provide novel insights into the mechanisms underlying the 'diversity of fascinating morphological innovations' that have enabled the abundance of life on planet Earth. In animals and plants, the 'innovations' that cannot be examined in common model organisms include mimicry, mutualism, parasitism, and asexual reproduction. De novo transcriptome assembly is often the preferred method to studying non-model organisms, since it is cheaper and easier than building a genome, and reference-based methods are not possible without an existing genome. The transcriptomes of these organisms can thus reveal novel proteins and their isoforms that are implicated in such unique biological phenomena. A set of assembled transcripts allows for initial gene expression studies. Prior to the development of transcriptome assembly computer programs, transcriptome data were analyzed primarily by mapping on to a reference genome. Though genome alignment is a robust way of characterizing transcript sequences, this method is disadvantaged by its inability to account for incidents of structural alterations of mRNA transcripts, such as alternative splicing. Since a genome contains the sum of all introns and exons that may be present in a transcript, spliced variants that do not align continuously along the genome may be discounted as actual protein isoforms. Even if a reference genome is available, de novo assembly should be performed, as it can recover transcripts that are transcribed from segments of the genome that are missing from the genome assembly. Unlike genome sequence coverage levels – which can vary randomly as a result of repeat content in non-coding intron regions of DNA – transcriptome sequence coverage levels can be directly indicative of gene expression levels. These repeated sequences also create ambiguities in the formation of contigs in genome assembly, while ambiguities in transcriptome assembly contigs usually correspond to spliced isoforms, or minor variation among members of a gene family. Genome assembler can't be directly used in transcriptome assembly for several reasons. First, genome sequencing depth is usually the same across a genome, but the depth of transcripts can vary. Second, both strands are always sequenced in genome sequencing, but RNA-seq can be strand-specific. Third, transcriptome assembly is more challenging because transcript variants from the same gene can share exons and are difficult to resolve unambiguously. Once RNA is extracted and purified from cells, it is sent to a high-throughput sequencing facility, where it is first reverse transcribed to create a cDNA library. This cDNA can then be fragmented into various lengths depending on the platform used for sequencing. Each of the following platforms utilizes a different type of technology to sequence millions of short reads: 454 Sequencing, Illumina, and SOLiD. See also List of RNA-Seq bioinformatics tools.

[ "Transcriptome", "Sequence assembly" ]
Parent Topic
Child Topic
    No Parent Topic