null | Acemap

The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.

Alignment-free sequence analysis

Multiple sequence alignment

Structural alignment

Smith–Waterman algorithm

Sequence (biology)

Benchmark (surveying)

MODELLER

10.1110/ps.03379804

Cite

Citations (191)

Parallel genetic algorithm for performance-driven sequence alignment

Genetic and Evolutionary Computation Conference (2001)

L. A. Anbarasu Vijayaraghava Seshadri Sundararajan P. Narayanasamy

The simultaneous alignment of three or more nucleotide or amino acid is among the most important tools for analyzing biological sequences. Multiple alignments are used to find characteristic motifs and conserved regions in protein families; to help demonstrarte homology between new sequences and existing families; to improve the prediction of secondary and tertiary structure of new sequences; and an essential pre-requisite to phylogenetic reconstruction. The fact that the multiple sequence alignment problem is of high complexity has led to the development of different algorithms. These algorithms fall into two categories namely the greedy ones that rely on pairwise alignment and those that attempt to align all the sequences simultaneously.

Multiple sequence alignment

Alignment-free sequence analysis

Structural alignment

Sequence (biology)

Smith–Waterman algorithm

Homology

Source

Cite

Citations (2)

Efficient Algorithms for Triple-wise Alignment and Its Applications

洪哲倫

Sequence alignment is a scientific method that contributes to DNA homology studies, phylogeny determinations, and identification of conserved motifs. In the past few decades, pair-wise alignment has become a methodological standard used in many MSA methods. However, an increasing number of studies indicated that the three-way alignment, which is the alignment of three sequences, is able to provide additional information or a more accurate alignment result than what pair-wise alignment is able to give. In this dissertation, we focused on the investigation and application of three-way alignment algorithm. For the investigation of three-way alignment algorithm, we proposed two efficient methods, a dynamic programming-based algorithm with the variable gap penalty strategy and a linear algorithm adopting a probabilistic filtration model, to align protein and DNA sequences, respectively. For the application of three-way alignment, we applied three-way alignment to the methods and applications that originally adopt pair-wise alignment approaches. We presented a new progressive multiple sequence alignment strategy that combines pair-wise and three-way alignments to compare multiple sequences accurately. Similarly, we extended the three-way alignment algorithm to align three profiles to provide the different insight to the profile-profile alignment method. In addition, we developed a parallel algorithm for three-profile alignment to reduce the computational cost. Further, we combined the three-profile alignment approach and a voting algorithm to select the functional sites of the target protein by comparing protein superfamilies. Theoretical analysis and extensive experimental tests of the proposed methods are conducted in this dissertation. From the conducted experimental results, we got some encouraged remarks regarding to the proposed methods for sequence analysis.

Multiple sequence alignment

Alignment-free sequence analysis

Structural alignment

Smith–Waterman algorithm

Sequence (biology)

10.6843/nthu.2010.00685

Cite

Citations (0)

Homology-extended sequence alignment

Nucleic Acids Research (2005)

V. A. Simossis

We present a profile-profile multiple alignment strategy that uses database searching to collect homologues for each sequence in a given set, in order to enrich their available evolutionary information for the alignment. For each of the alignment sequences, the putative homologous sequences that score above a pre-defined threshold are incorporated into a position-specific pre-alignment profile. The enriched position-specific profile is used for standard progressive alignment, thereby more accurately describing the characteristic features of the given sequence set. We show that owing to the incorporation of the pre-alignment information into a standard progressive multiple alignment routine, the alignment quality between distant sequences increases significantly and outperforms state-of-the-art methods, such as T-COFFEE and MUSCLE. We also show that although entirely sequence-based, our novel strategy is better at aligning distant sequences when compared with a recent contact-based alignment method. Therefore, our pre-alignment profile strategy should be advantageous for applications that rely on high alignment accuracy such as local structure prediction, comparative modelling and threading.

Multiple sequence alignment

Alignment-free sequence analysis

Structural alignment

Threading (protein sequence)

Smith–Waterman algorithm

Sequence (biology)

Homology

Position (finance)

Sequence homology

10.1093/nar/gki233

Cite

Citations (115)

Improving accuracy of multiple sequence alignment algorithms based on alignment of neighboring residues

Nucleic Acids Research (2008)

Yue Lu Sing-Hoi Sze

While most of the recent improvements in multiple sequence alignment accuracy are due to better use of vertical information, which include the incorporation of consistency-based pairwise alignments and the use of profile alignments, we observe that it is possible to further improve accuracy by taking into account alignment of neighboring residues when aligning two residues, thus making better use of horizontal information. By modifying existing multiple alignment algorithms to make use of horizontal information, we show that this strategy is able to consistently improve over existing algorithms on a few sets of benchmark alignments that are commonly used to measure alignment accuracy, and the average improvements in accuracy can be as much as 1–3% on protein sequence alignment and 5–10% on DNA/RNA sequence alignment. Unlike previous algorithms, consistent average improvements can be obtained across all identity levels.

Alignment-free sequence analysis

Multiple sequence alignment

Structural alignment

Benchmark (surveying)

Sequence (biology)

Smith–Waterman algorithm

10.1093/nar/gkn945

Cite

Citations (19)

A comprehensive comparison of multiple sequence alignment programs

Nucleic Acids Research (1999)

Julie Thompson Frédéric Plewniak Olivier Poch

In recent years improvements to existing programs and the introduction of new iterative algorithms have changed the state-of-the-art in protein sequence alignment. This paper presents the first systematic study of the most commonly used alignment programs using BAliBASE benchmark alignments as test cases. Even below the 'twilight zone' at 10–20% residue identity, the best programs were capable of correctly aligning on average 47% of the residues. We show that iterative algorithms often offer improved alignment accuracy though at the expense of computation time. A notable exception was the effect of introducing a single divergent sequence into a set of closely related sequences, causing the iteration to diverge away from the best alignment. Global alignment programs generally performed better than local methods, except in the presence of large N/C-terminal extensions and internal insertions. In these cases, a local algorithm was more successful in identifying the most conserved motifs. This study enables us to propose appropriate alignment strategies, depending on the nature of a particular set of sequences. The employment of more than one program based on different alignment techniques should significantly improve the quality of automatic protein sequence alignment methods. The results also indicate guidelines for improvement of alignment algorithms.

Alignment-free sequence analysis

Multiple sequence alignment

Structural alignment

Benchmark (surveying)

Sequence (biology)

Smith–Waterman algorithm

10.1093/nar/27.13.2682

Cite

Citations (763)

ALIGN_MTX—An optimal pairwise textual sequence alignment program, adapted for using in sequence-structure alignment

Computational Biology and Chemistry (2009)

Boris Vishnepolsky Malak Pirtskhalava

Structural alignment

Multiple sequence alignment

Alignment-free sequence analysis

Threading (protein sequence)

Sequence (biology)

Smith–Waterman algorithm

Sequence logo

Distance matrix

10.1016/j.compbiolchem.2009.04.003

Cite

Citations (10)

Using Structure to Explore the Sequence Alignment Space of Remote Homologs

PLoS Computational Biology (2011)

Andrew Kuziemko Barry Honig Donald Petrey

Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.

Structural alignment

Alignment-free sequence analysis

Multiple sequence alignment

Sequence (biology)

Loop modeling

Sequence logo

Smith–Waterman algorithm

Limiting

Homology

10.1371/journal.pcbi.1002175

Cite

Citations (11)

INTERALIGN: interactive alignment editor for distantly related protein sequences

Bioinformatics (2005)

Olivier Pible G. Imbert Jean‐Luc Pellequer

Summary: Improving and ascertaining the quality of a multiple sequence alignment is a very challenging step in protein sequence analysis. This is particularly the case when dealing with sequences in the ‘twilight zone’, i.e. sharing <30% identity. Here we describe INTERALIGN, a dedicated user-friendly alignment editor including a view of secondary structures and a synchronized display of carbon alpha traces of corresponding protein structures. Profile alignment, using CLUSTALW, is implemented to improve the alignment of a sequence of unknown structure with the visually optimized structural alignment as compared with a standard multiple sequence alignment. Tree-based ordering further helps in identifying the structure closest to a given sequence. Availability: Windows and Linux packages, as well as source files, are available under the CeCILL free software licensing agreement at the following address: http://www-dsv.cea.fr/content/cea/d_dep/d_diep/d_sbtn/download.htm Contact:olivier.pible@cea.fr

Alignment-free sequence analysis

Multiple sequence alignment

Structural alignment

Sequence (biology)

Tree (set theory)

10.1093/bioinformatics/bti474

Cite

Citations (10)

Multiple Sequence Alignment

Kluwer international series in engineering and computer science (2003)

Karl-Heinz Zimmermann

Multiple sequence alignment