logo
    Fast variation-aware read alignment with deBGA-VARA
    1
    Citation
    35
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Many genetic variants have been reported from sequencing projects due to decreasing experimental costs. Compared to the current typical paradigm, read mapping incorporating existing variants can improve the performance of subsequent analysis. However, storing and indexing various types of variation require costly RAM space. Aligning reads to a graph model-based index including the whole set of variants is ultimately an NP-hard problem in theory. This method is supposed to map sequencing reads efficiently to a graphical index with a reference genome and known variation to increase alignment quality and variant calling accuracy. Herein, we propose a variation-aware read alignment algorithm (VARA), which generates the alignment between read and multiple genomic sequences simultaneously utilizing the schema of the Landau-Vishkin algorithm. VARA dynamically extracts regional variants to construct a pseudo tree-based structure on-the-fly for seed extension without loading the whole genome variation into memory space. We developed the novel high-throughput sequencing read aligner deBGA-VARA by integrating VARA into deBGA. The deBGA-VARA is benchmarked both on simulated reads and the NA12878 sequencing dataset. The experimental results demonstrate that read alignment incorporating genetic variation knowledge can achieve high sensitivity and accuracy. Moreover, due to its efficiency, VARA provides a promising solution for further improvement of variant calling while maintaining small memory footprints. The deBGA-VARA is available at: https://github.com/hitbc/deBGA-VARA.
    Keywords:
    Schema (genetic algorithms)
    Variation (astronomy)
    Structural Variation
    Firstly,according to variation contents,the paper divides variation into operating range variation,construction condition variation,design variation,construction variation and technique standards variation. Contraposing to different content variation,the paper puts forward dissimilar control methods. According to the property of project variation,the paper divides variation into fatal variation,important variation and general variation,and points out that we can control variation effectively by establishing dissimilar authority of project variation examination and approval. Finally,according to the urgency of variation,the paper divides variation into variation under urgent circumstance and variation under not urgent circumstance,and puts forward that we can control these variation by different procedure.
    Variation (astronomy)
    Process Variation
    Citations (0)
    AbstractThe names of 1024 American automobiles are analyzed according to register variation, diachronic variation and manufacturer variation. Variation through such linguistic means as structural complexity, metonymy and iconicity is shown to not only reflect social realities of automobile producers and consumers but to actively contribute to constructing these realities as well.
    Variation (astronomy)
    Iconicity
    Citations (7)
    Pan-genomes help to describe the genomic variation within a species, and can be split into the core genome containing genes common to all individuals, and a dispensable (or variable) genome consisting of partially shared DNA sequence elements. This chapter discusses the importance of pan-genomes, indicating that pan-genomics is being used to study genomic structural variations, including presence–absence variations and copy number variations in plant genomes. A list is provided of pan-genome studies in plants, including wheat, rice, maize and Brassica. It is suggested that pan-genomes can provide a complete genomic content of a species and add value to all aspects of genomic studies and molecular breeding strategies, with direct applications for crop improvement.
    Structural Variation
    The analysis of variation in plants has revealed that their genomes are characterised by high levels of structural variation, consisting of both smaller insertion/deletions, mostly due to recent insertions of transposable elements, and of larger insertion/deletion similar to those termed in humans Copy Number Variants (CNVs). These observations indicate that a single genome sequence might not reflect the entire genomic complement of a species, and prompted us to introduce the concept of the plant pan-genome.
    Structural Variation
    Variation (astronomy)
    Insertion sequence
    Complement
    Sequence (biology)
    Citations (1)
    Abstract Due to the development of sequencing technology and the great reduction in sequencing costs, an increasing number of plant genomes have been assembled, and numerous genomes have revealed large amounts of variations. However, a single reference genome does not allow the exploration of species diversity, and therefore the concept of pan-genome was developed. A pan-genome is a collection of all sequences available for a species, including a large number of consensus sequences, large structural variations, and small variations including single nucleotide polymorphisms and insertions/deletions. A simple linear pan-genome does not allow these structural variations to be intuitively characterized, so graph-based pan-genomes have been developed. These pan-genomes store sequence and structural variation information in the form of nodes and paths to store and display species variation information in a more intuitive manner. The key role of graph-based pan-genomes is to expand the coordinate system of the linear reference genome to accommodate more regions of genetic diversity. Here, we review the origin and development of graph-based pan-genomes, explore their application in plant research, and further highlight the application of graph-based pan-genomes for future plant breeding.
    Citations (23)
    Abstract Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation.
    Structural Variation
    Gene density
    Citations (135)
    Cross-talks and sketches are two important forms of language art in our country. They are popular with a wide range of audiences around the country due to their humorous language. Language variation is a very important means of creating humor effect. In cross-talks and sketches,various forms of language variation, including phonetic variation, lexical variation,grammar variation, semantic variation,stylistic variation and pragmatic variation,are applied to realize humor effect.
    Variation (astronomy)
    Citations (0)
    Studies on structural variation in plants have revealed the inadequacy of a single reference genome for an entire species and suggest that it is necessary to build a species-representative genome called a pan-genome to better capture the extent of both structural and nucleotide variation. Here, we present a pan-genome of cultivated soybean (Glycine max), termed PanSoy, constructed using the de novo genome assembly of 204 phylogenetically and geographically representative improved accessions selected from the larger GmHapMap collection. PanSoy uncovers 108 Mb (˜11%) of novel nonreference sequences encompassing 3621 protein-coding genes (including 1659 novel genes) absent from the soybean 'Williams 82' reference genome. Nonetheless, the core genome represents an exceptionally large proportion of the genome, with >90.6% of genes being shared by >99% of the accessions. A majority of PAVs encompassing genes could be confirmed with long-read sequencing on a subset of accessions. The PanSoy is a major step towards capturing the extent of genetic variation in cultivated soybean and provides a resource for soybean genomics research and breeding.
    Structural Variation
    Genome size
    Glycine soja
    Comparative Genomics
    Citations (65)
    Significance One fundamental analysis needed to interpret genome assemblies is genome alignment. Yet, accurately aligning regulatory and transposon regions outside of genes remains challenging. We introduce Anchored Wavefront alignment (AnchorWave), which implements a genome duplication informed longest path algorithm to identify collinear regions and performs base pair–resolved, end-to-end alignment for collinear blocks using an efficient two-piece affine gap cost strategy. AnchorWave improves the alignment under a number of scenarios: genomes with high similarity, large genomes with high transposable element activity, genomes with many inversions, and alignments between species with deeper evolutionary divergence and different whole-genome duplication histories. Potential use cases include genome comparison for evolutionary analysis of nongenic sequences and population genetics of taxa with large, repeat-rich genomes.
    Indel
    Structural Variation
    Bacterial genome size
    Comparative Genomics
    INDEL Mutation
    Citations (63)
    Abstract Millions of species are currently being sequenced and their genomes are being compared. Many of them have more complex genomes than model systems and raised novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication or polyploidy levels. Here we introduce AnchorWave, which performs whole-genome duplication informed collinear anchor identification between genomes and performs base-pair resolution global alignments for collinear blocks using the wavefront algorithm and a 2-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multi-kilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs between two maize lines. By contrast, other genome alignment tools showed almost zero power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome than the closest competitive approach, when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor binding sites (TFBSs) at a rate of 1.05-74.85 fold higher than other tools, while with significantly lower false positive alignments. AnchorWave shows obvious improvement when applied to genomes with dispersed repeats, active transposable elements, high sequence diversity and whole-genome duplication variation. Significance statement One fundamental analysis needed to interpret genome assemblies is genome alignment. Yet, accurately aligning regulatory and transposon regions outside of genes remains challenging. We introduce AnchorWave, which implements a genome duplication informed longest path algorithm to identify collinear regions and performs base-pair resolved, end-to-end alignment for collinear blocks using an efficient 2-piece affine gap cost strategy. AnchorWave improves alignment of partially synthetic and real genomes under a number of scenarios: genomes with high similarity, large genomes with high TE activity, genomes with many inversions, and alignments between species with deeper evolutionary divergence and different whole-genome duplication histories. Potential use cases for the method include genome comparison for evolutionary analysis of non-genic sequences and population genetics of taxa with complex genomes.
    Structural Variation
    Indel
    Bacterial genome size
    Citations (4)