logo
    AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication
    63
    Citation
    55
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Significance One fundamental analysis needed to interpret genome assemblies is genome alignment. Yet, accurately aligning regulatory and transposon regions outside of genes remains challenging. We introduce Anchored Wavefront alignment (AnchorWave), which implements a genome duplication informed longest path algorithm to identify collinear regions and performs base pair–resolved, end-to-end alignment for collinear blocks using an efficient two-piece affine gap cost strategy. AnchorWave improves the alignment under a number of scenarios: genomes with high similarity, large genomes with high transposable element activity, genomes with many inversions, and alignments between species with deeper evolutionary divergence and different whole-genome duplication histories. Potential use cases include genome comparison for evolutionary analysis of nongenic sequences and population genetics of taxa with large, repeat-rich genomes.
    Keywords:
    Indel
    Structural Variation
    Bacterial genome size
    Comparative Genomics
    INDEL Mutation
    Nucleotide insertions and deletions (indels) are responsible for gaps in the sequence alignments. Indel is one of the major sources of evolutionary change at the molecular level. We have examined the patterns of insertions and deletions in the 19 mammalian genomes, and found that deletion events are more common than insertions in the mammalian genomes. Both the number of insertions and deletions decrease rapidly when the gap length increases and single nucleotide indel is the most frequent in all indel events. The frequencies of both insertions and deletions can be described well by power law. Keywords: Insertion, deletion, gap, indel, mammalian genome
    Indel
    INDEL Mutation
    Sequence (biology)
    Citations (46)
    Abstract Insertion and deletion (INDELs) mutations, the most common type of structural variation in the human genome, have been implicated in numerous human traits and diseases including rare genetic disorders and cancer. Next generation sequencing (NGS) technologies have drastically reduced the cost of sequencing whole genomes, greatly contributing to genome-wide detection of structural variants. However, due to large variations in INDEL sizes and presence of low complexity and repeat regions, their detection remains a challenge. Here we present a hybrid approach, HyINDEL, which integrates clustering, split-mapping and assembly-based approaches, for the detection of INDELs of all sizes (from small to large) and also identifies the insertion sequences. The method starts with identifying clusters of discordant and soft-clip reads which are validated by depth-of-coverage and alignment of soft-clip reads to identify candidate INDELs, while the assembly -based approach is used in identifying the insertion sequence. Performance of HyINDEL is evaluated on both simulated and real datasets and compared with state-of-the-art tools. A significant improvement in recall and F-score metrics as well as in breakpoint support is observed on using soft-clip alignments. It is freely available at https://github.com/alok123t/HyINDEL .
    Indel
    INDEL Mutation
    Structural Variation
    Breakpoint
    Hybrid genome assembly
    Sequence (biology)
    Citations (0)
    Major unresolved questions in evolutionary genetics include determining the contributions of different mutational sources to the total pool of genetic variation in a species, and understanding how these different forms of genetic variation interact with natural selection. Recent work has shown that structural variants (insertions, deletions, inversions and transpositions) are a major source of genetic variation, often out-numbering single nucleotide variants in terms of total bases affected. Despite the near ubiquity of structural variants, major questions about their interaction with natural selection remain. For example, how does the allele frequency spectrum of structural variants differ when compared to single nucleotide variants? How often do structural variants affect genes, and what are the consequences? To begin to address these questions, we have systematically identified and characterized a large set submicroscopic insertion and deletion (indel) variants (between 1 kb to 200 kb in length) among ten individuals from a single natural population of the plant species Mimulus guttatus. After extensive computational filtering, we focused on a set of 4,142 high-confidence indels that showed an experimental validation rate of 73%. All but one of these indels were < 200 kb. While the largest were generally at lower frequencies in the population, a surprising number of large indels are at intermediate frequencies. While indels overlapping with genes were much rarer than expected by chance, nearly 600 genes were affected by an indel. NBS-LRR defense response genes were the most enriched among the gene families affected. Most indels associated with genes were rare and appeared to be under purifying selection, though we do find four high-frequency derived insertion alleles that show signatures of recent positive selection.
    Indel
    Structural Variation
    Balancing selection
    INDEL Mutation
    Citations (0)
    Abstract SARS-CoV-2, which causes the current pandemic of respiratory illness, is evolving continuously and generating new variants. Nevertheless, most of the sequence analyses thus far focused on nucleotide substitutions despite the fact that insertions and deletions (indels) are equally important in the evolution of SARS-CoV-2. In this study, we analyzed 1,099,664 high-quality sequences of SARS-CoV-2 genomes to re-construct the evolutionary and epidemiological histories of indels. Our analysis revealed 289 circulating indel types (237 deletion and 52 insertion types, each represented by more than ten genomic sequences), among which eighteen were recurrent indel types, each represented by more than 500 genome sequences. Although indels were identified across the entire genome, most of them were identified in nsp6, S, ORF8, and N genes, among which ORF8 indel types had the highest frequencies of frameshift. Geographical and temporal analyses of these variants revealed a few alterations of dominant indel types, each accompanied by geographic expansion to different countries and continents, which resulted in the fixation of several types of indels in the field, including the current variants of concern. Evolutionary and structural analyses revealed that indels involving S N-terminal domain regions were linked to the 3/4 variants of concern, resulting in significantly altered S protein that might contribute to the selective advantage of the corresponding variant. In sum, our study highlights the important role of insertions and deletions in the evolution and spread of SARS-CoV-2.
    Indel
    INDEL Mutation
    Citations (10)
    Mutation and selection are both thought to impact significantly the nucleotide composition of bacterial genomes. Earlier studies have compared closely related strains to obtain mutation patterns based on the hypothesis that these bacterial strains had diverged so recently that selection will not have had enough time to play its role. In this study, we used a SOLiD autosequencer that was based on a dual-base encoding scheme to sequence the genome of Staphylococcus aureus with a mapping coverage of over 5,000×. By directly counting the variation obtained from these ultradeep sequencing reads, we found that A → G was the predominant single-base substitution and 1 bp deletions were the major small indel. These patterns are completely different from those obtained by comparison of closely related S. aureus strains, where C → T accounted for a larger proportion of mutations and deletions were shown to occur at an almost equal frequency to insertion. These findings suggest that the genomic differences between closely related bacterial strains have already undergone selection and are therefore not representative of spontaneous mutation.
    Indel
    Bacterial genome size
    Mutation Accumulation
    INDEL Mutation
    Citations (3)
    Genomic structural alterations that vary within species, known as large copy number variants, represent an unanticipated and abundant source of genetic diversity that associates with variation in gene expression and susceptibility to disease. Even short insertions and deletions (indels) can exert important effects on genomes by locally increasing the mutation rate, with multiple mechanisms proposed to account for this pattern. To better understand how indels promote genome evolution, we demonstrate that the single nucleotide mutation rate is elevated in the vicinity of indels, with a resolution of tens of base pairs, for the two closely related nematode species Caenorhabditis remanei and C. sp. 23. In addition to indels being clustered with single nucleotide polymorphisms and fixed differences, we also show that transversion mutations are enriched in sequences that flank indels and that many indels associate with sequence repeats. These observations are compatible with a model that reconciles previously proposed mechanisms of indel-associated mutagenesis, implicating repeat sequences as a common driver of indel errors, which then recruit error-prone polymerases during DNA repair, resulting in a locally elevated single nucleotide mutation rate. The striking influence of indel variants on the molecular evolution of flanking sequences strengthens the emerging general view that mutations can induce further mutations.
    Indel
    INDEL Mutation
    Transversion
    Citations (36)
    A relatively rare type of mutation causing human genetic disease is the indel, a complex lesion that appears to represent a combination of micro-deletion and micro-insertion. In the absence of meta-analytical studies of indels, the mutational mechanisms underlying indel formation remain unclear. Data from the Human Gene Mutation Database (HGMD) were therefore used to compare and contrast 211 different indels underlying genetic disease in an attempt to deduce the processes responsible for their genesis. Each indel was treated as if it were the result of a two-step insertion/deletion process and was assessed in the context of 10 base-pairs DNA sequence flanking the lesion on either side. Several indel hotspots were noted and a GTAAGT motif was found to be significantly over-represented in the vicinity of the indels studied. Previously postulated mechanisms underlying micro-deletions and micro-insertions were initially explored in terms of local DNA sequence regularity as measured by its complexity. The change in complexity consequent to a mutation was found to be indicative of the type of repeat sequence involved in mediating the event, thereby providing clues as to the underlying mutational mechanism. Complexity analysis was then employed to examine the possible intermediates through which each indel could have occurred and to propose likely mechanisms and pathways for indel generation on an individual basis. Manual analysis served to confirm that the majority of indels (>90%) are explicable in terms of a two-step process involving established mutational mechanisms. Indels equivalent to double base-pair substitutions (22% of the total) were found to be mechanistically indistinguishable from the remainder and may therefore be regarded as a special type of indel. The observed correspondence between changes in local DNA sequence complexity and the involvement of specific mutational mechanisms in the insertion/deletion process, and the ability of generated models to account for both the number and identity of the bases deleted and/or inserted, makes this approach invaluable not only for the analysis of indel formation, but also for the study of other types of complex lesion. Hum Mutat 21:28–44, 2002. © 2002 Wiley-Liss, Inc.
    Indel
    INDEL Mutation
    Sequence (biology)
    Citations (123)
    With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them.
    Indel
    INDEL Mutation
    1000 Genomes Project
    Citations (58)