Exons, Introns, and DNA Thermodynamics

2005 
One of the most striking aspects of the human genome is the presence of long stretches of DNAwith no apparent (or known) significance [1]. This is what biologists refer to as junk DNA, and it comprises the majority of our DNA. In the human genome (and that of other higher eukaryotes) not only are the genes very sparse, but most of them are interrupted by sequences, the introns, which are noncoding; i.e., they do not carry information for protein synthesis [1]. During transcription, introns are therefore removed from the messenger RNA (mRNA), which is assembled only from the expressed parts of the gene, the exons. In the human genome, introns are on average 10 times longer than exons and thus constitute the majority of the gene. Procaryotes (such as bacteria) instead have a very compact genome without introns [1]. The discovery of introns in 1977 triggered a debate around their significance and origin, which lead to the formulation of the ‘‘introns-early’’ [2‐4] and the ‘‘introns-late’’ theories [5‐7]. According to the intronsearly viewpoint, the introns appeared at the origin of life and the exons were small ancient genes. The bacteria then lost the introns due to selective pressure in order to keep their genome short. The introns-late theory instead claims that introns must have appeared much later, i.e., during the early eukaryotic evolution. A consensus between these opposing views has meanwhile been reached in recent years. The analysis of an increasing number of genes showed that most of the introns have a ‘‘recent’’ origin, although a few are still believed to be very old [8]. The mechanism by which introns were included into the genome is, however, still poorly understood (for a recent discussion, see, e.g., Ref. [9]). In this Letter, we present the results of a study of the physical properties of human DNA sequences which points to a possible pathway leading to intron insertion in genes. By means of a statistical mechanics approach, we analyze the thermodynamic stability (‘‘melting’’) of DNA sequences obtained by assembling the exons together. This is known as complementary DNA (cDNA) and can be obtained in the laboratory by reverse transcription of mRNA. As illustrated in Fig. 1, cDNA is characterized by exonexon boundaries and the boundaries between the coding sequence (CDS) and the untranslated region (UTR). If introns were inserted ‘‘recently’’ into the genome, then the cDNA roughly resembles an ancient gene, apart from the mutations that have occurred since the insertion of the first introns (see below). We find that the exon-exon boundaries in cDNA sequences are strongly correlated with their melting domains. DNA melting is the process by which the doublestranded molecule in solution dissociates into two separate strands by an increase of temperature [10]. Fragments which are longer than 1000 bp (base pairs) dissociate through a multistep process in which different parts of the chain melt at different temperatures. These ‘‘melting domains’’ are typically a few hundreds of nucleotides long. The thermodynamics of the DNA melting process has been investigated both experimentally [10] and by means of numerical calculations based on the statistical mechanics of the dissociation process [11,12]. The latter approach allows one to calculate ! i, the probability that the ith
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    29
    Citations
    NaN
    KQI
    []