logo
    Human-specific insertions and deletions inferred from mammalian genome sequences
    61
    Citation
    28
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    It has been suggested that insertions and deletions (indels) have contributed to the sequence divergence between the human and chimpanzee genomes more than do nucleotide changes (3% vs. 1.2%). However, although there have been studies of large indels between the two genomes, no systematic analysis of small indels (i.e., indels ≤ 100 bp) has been published. In this study, we first estimated that the false-positive rate of small indels inferred from human–chimpanzee pairwise sequence alignments is quite high, suggesting that the chimpanzee genome draft is not sufficiently accurate for our purpose. We have therefore inferred only human-specific indels using multiple sequence alignments of mammalian genomes. We identified >840,000 “small” indels, which affect >7000 UCSC-annotated human genes (>11,000 transcripts). These indels, however, amount to only ∼0.21% sequence change in the human lineage for the regions compared, whereas in pseudogenes indels contribute to a sequence divergence of 1.40%, suggesting that most of the indels that occurred in genic regions have been eliminated. Functional analysis reveals that the genes whose coding exons have been affected by human-specific indels are enriched in transcription and translation regulatory activities but are underrepresented in catalytic and transporter activities, cellular and physiological processes, and extracellular region/matrix. This functional bias suggests that human-specific indels might have contributed to human unique traits by causing changes at the RNA and protein level.
    Keywords:
    Indel
    Pseudogene
    INDEL Mutation
    In this review, we focus on progress that has been made with detecting small insertions and deletions (INDELs) in human genomes. Over the past decade, several million small INDELs have been discovered in human populations and personal genomes. The amount of genetic variation that is caused by these small INDELs is substantial. The number of INDELs in human genomes is second only to the number of single nucleotide polymorphisms (SNPs), and, in terms of base pairs of variation, INDELs cause similar levels of variation as SNPs. Many of these INDELs map to functionally important sites within human genes, and thus, are likely to influence human traits and diseases. Therefore, small INDEL variation will play a prominent role in personalized medicine.
    Indel
    INDEL Mutation
    1000 Genomes Project
    Human genetic variation
    Human genetics
    Citations (343)
    Nucleotide substitution, insertion and deletion (indel) events are the major driving forces that have shaped genomes. Using the recently identified human ribosomal protein (RP) pseudogene sequences, we have thoroughly studied DNA mutation patterns in the human genome. We analyzed a total of 1726 processed RP pseudogene sequences, comprising more than 700 000 bases. To be sure to differentiate the sequence changes occurring in the functional genes during evolution from those occurring in pseudogenes after they were fixed in the genome, we used only pseudogene sequences originating from parts of RP genes that are identical in human and mouse. Overall, we found that nucleotide transitions are more common than transversions, by roughly a factor of two. Moreover, the substitution rates amongst the 12 possible nucleotide pairs are not homogeneous as they are affected by the type of immediately neighboring nucleotides and the overall local G+C content. Finally, our dataset is large enough that it has many indels, thus allowing for the first time statistically robust analysis of these events. Overall, we found that deletions are about three times more common than insertions (3740 versus 1291). The frequencies of both these events follow characteristic power-law behavior associated with the size of the indel. However, unexpectedly, the frequency of 3 bp deletions (in contrast to 3 bp insertions) violates this trend, being considerably higher than that of 2 bp deletions. The possible biological implications of such a 3 bp bias are discussed. PMID: 12954770 Funding information This work was supported by: NHGRI NIH HHS, United States Grant ID: P50HG02357-01
    Pseudogene
    Indel
    Citations (0)
    Abstract SARS-CoV-2, which causes the current pandemic of respiratory illness, is evolving continuously and generating new variants. Nevertheless, most of the sequence analyses thus far focused on nucleotide substitutions despite the fact that insertions and deletions (indels) are equally important in the evolution of SARS-CoV-2. In this study, we analyzed 1,099,664 high-quality sequences of SARS-CoV-2 genomes to re-construct the evolutionary and epidemiological histories of indels. Our analysis revealed 289 circulating indel types (237 deletion and 52 insertion types, each represented by more than ten genomic sequences), among which eighteen were recurrent indel types, each represented by more than 500 genome sequences. Although indels were identified across the entire genome, most of them were identified in nsp6, S, ORF8, and N genes, among which ORF8 indel types had the highest frequencies of frameshift. Geographical and temporal analyses of these variants revealed a few alterations of dominant indel types, each accompanied by geographic expansion to different countries and continents, which resulted in the fixation of several types of indels in the field, including the current variants of concern. Evolutionary and structural analyses revealed that indels involving S N-terminal domain regions were linked to the 3/4 variants of concern, resulting in significantly altered S protein that might contribute to the selective advantage of the corresponding variant. In sum, our study highlights the important role of insertions and deletions in the evolution and spread of SARS-CoV-2.
    Indel
    INDEL Mutation
    Citations (10)
    Genomic structural alterations that vary within species, known as large copy number variants, represent an unanticipated and abundant source of genetic diversity that associates with variation in gene expression and susceptibility to disease. Even short insertions and deletions (indels) can exert important effects on genomes by locally increasing the mutation rate, with multiple mechanisms proposed to account for this pattern. To better understand how indels promote genome evolution, we demonstrate that the single nucleotide mutation rate is elevated in the vicinity of indels, with a resolution of tens of base pairs, for the two closely related nematode species Caenorhabditis remanei and C. sp. 23. In addition to indels being clustered with single nucleotide polymorphisms and fixed differences, we also show that transversion mutations are enriched in sequences that flank indels and that many indels associate with sequence repeats. These observations are compatible with a model that reconciles previously proposed mechanisms of indel-associated mutagenesis, implicating repeat sequences as a common driver of indel errors, which then recruit error-prone polymerases during DNA repair, resulting in a locally elevated single nucleotide mutation rate. The striking influence of indel variants on the molecular evolution of flanking sequences strengthens the emerging general view that mutations can induce further mutations.
    Indel
    INDEL Mutation
    Transversion
    Citations (36)
    Nucleotide substitution, insertion and deletion (indel) events are the major driving forces that have shaped genomes. Using the recently identified human ribosomal protein (RP) pseudogene sequences, we have thoroughly studied DNA mutation patterns in the human genome. We analyzed a total of 1726 processed RP pseudogene sequences, comprising more than 700 000 bases. To be sure to differentiate the sequence changes occurring in the functional genes during evolution from those occurring in pseudogenes after they were fixed in the genome, we used only pseudogene sequences originating from parts of RP genes that are identical in human and mouse. Overall, we found that nucleotide transitions are more common than transversions, by roughly a factor of two. Moreover, the substitution rates amongst the 12 possible nucleotide pairs are not homogeneous as they are affected by the type of immediately neighboring nucleotides and the overall local G+C content. Finally, our dataset is large enough that it has many indels, thus allowing for the first time statistically robust analysis of these events. Overall, we found that deletions are about three times more common than insertions (3740 versus 1291). The frequencies of both these events follow characteristic power–law behavior associated with the size of the indel. However, unexpectedly, the frequency of 3 bp deletions (in contrast to 3 bp insertions) violates this trend, being considerably higher than that of 2 bp deletions. The possible biological implications of such a 3 bp bias are discussed.
    Pseudogene
    Indel
    Citations (262)
    A relatively rare type of mutation causing human genetic disease is the indel, a complex lesion that appears to represent a combination of micro-deletion and micro-insertion. In the absence of meta-analytical studies of indels, the mutational mechanisms underlying indel formation remain unclear. Data from the Human Gene Mutation Database (HGMD) were therefore used to compare and contrast 211 different indels underlying genetic disease in an attempt to deduce the processes responsible for their genesis. Each indel was treated as if it were the result of a two-step insertion/deletion process and was assessed in the context of 10 base-pairs DNA sequence flanking the lesion on either side. Several indel hotspots were noted and a GTAAGT motif was found to be significantly over-represented in the vicinity of the indels studied. Previously postulated mechanisms underlying micro-deletions and micro-insertions were initially explored in terms of local DNA sequence regularity as measured by its complexity. The change in complexity consequent to a mutation was found to be indicative of the type of repeat sequence involved in mediating the event, thereby providing clues as to the underlying mutational mechanism. Complexity analysis was then employed to examine the possible intermediates through which each indel could have occurred and to propose likely mechanisms and pathways for indel generation on an individual basis. Manual analysis served to confirm that the majority of indels (>90%) are explicable in terms of a two-step process involving established mutational mechanisms. Indels equivalent to double base-pair substitutions (22% of the total) were found to be mechanistically indistinguishable from the remainder and may therefore be regarded as a special type of indel. The observed correspondence between changes in local DNA sequence complexity and the involvement of specific mutational mechanisms in the insertion/deletion process, and the ability of generated models to account for both the number and identity of the bases deleted and/or inserted, makes this approach invaluable not only for the analysis of indel formation, but also for the study of other types of complex lesion. Hum Mutat 21:28–44, 2002. © 2002 Wiley-Liss, Inc.
    Indel
    INDEL Mutation
    Sequence (biology)
    Citations (123)
    Abstract Background: Chloroplast(cp) genome information would facilitate the development and utilization of Taxodium resources. However, cp genome characteristics of Taxodium were poorly understood. Results: We determined the complete cp genome sequences of T. distichum , T. mucronatum , and T. ascendens . The cp genomes are 131,947 bp to 132,613bp in length, encode 120 genes with the same order, and lack typical inverted repeat (IR) regions. The longest small IR, a 282 bp trnQ-containing IR, were involved in the formation of isomers. Comparative analysis of the three cp genomes showed that 91.57% of the indels resulted in the periodic variation of tandem repeat(TR) motifs and 72.46% single nucleotide polymorphisms (SNPs) located closely to TRs, suggesting a relationship between TRs and mutational dynamics. Eleven hypervariable regions were identified as candidates for DNA barcode development. Ycf1 was the only one gene that has an indel in CDS, and the indel is composed of a long TR. When extended to cupressophytes, ycf1 genes have undergone a universal insertion of TRs accompanied by extreme length expansion. Meanwhile, ycf1 also located in rearrangement endpoints of cupressophyte cp genomes. All these characteristics highlight the important role of repeats in the evolution of cp genomes. Conclusions: This study would advance our understanding of the complexity, dynamics, and evolution of cp genome in Cupressophytes.
    Indel
    Inverted repeat
    Taxodium
    Comparative Genomics
    INDEL Mutation
    Citations (2)
    Human-specific small insertions and deletions (HS indels, with lengths <100 bp) are reported to be ubiquitous in the human genome. However, whether these indels contribute to human-specific traits remains unclear. Here we employ a modified McDonald-Kreitman (MK) test and a combinatorial population genetics approach to infer, respectively, the occurrence of positive selection and recent selective sweep events associated with HS indels. We first extract 625,890 HS indels from the human-chimpanzee-macaque-mouse multiple alignments and classify them into nonpolymorphic (41%) and polymorphic (59%) indels with reference to the human indel polymorphism data. The modified MK test is then applied to 100-kb partially overlapped sliding windows across the human genome to scan for the signs of positive selection. After excluding the possibility of biased gene conversion and controlling for false discovery rate, we show that HS indels are potentially positively selected in about 10 Mb of the human genome. Furthermore, the indel-associated positively selected regions overlap with genes more often than expected. However, our result suggests that the potential targets of positive selection are located in noncoding regions. Meanwhile, we also demonstrate that the genomic regions surrounding HS indels are more frequently involved in recent selective sweep than the other regions. In addition, HS indels are associated with distinct recent selective sweep events in different human subpopulations. Our results suggest that HS indels may have been associated with human adaptive changes at both the species level and the subpopulation level.
    Indel
    1000 Genomes Project
    INDEL Mutation
    Negative selection
    Selective sweep
    Positive selection
    Background selection
    Citations (22)