logo
    Keywords:
    Alignment-free sequence analysis
    Multiple sequence alignment
    Sequence (biology)
    Structural alignment
    The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.
    Alignment-free sequence analysis
    Multiple sequence alignment
    Structural alignment
    Smith–Waterman algorithm
    Sequence (biology)
    Benchmark (surveying)
    MODELLER
    Citations (191)
    The simultaneous alignment of three or more nucleotide or amino acid is among the most important tools for analyzing biological sequences. Multiple alignments are used to find characteristic motifs and conserved regions in protein families; to help demonstrarte homology between new sequences and existing families; to improve the prediction of secondary and tertiary structure of new sequences; and an essential pre-requisite to phylogenetic reconstruction. The fact that the multiple sequence alignment problem is of high complexity has led to the development of different algorithms. These algorithms fall into two categories namely the greedy ones that rely on pairwise alignment and those that attempt to align all the sequences simultaneously.
    Multiple sequence alignment
    Alignment-free sequence analysis
    Structural alignment
    Sequence (biology)
    Smith–Waterman algorithm
    Homology
    Citations (2)
    Sequence alignment is a scientific method that contributes to DNA homology studies, phylogeny determinations, and identification of conserved motifs. In the past few decades, pair-wise alignment has become a methodological standard used in many MSA methods. However, an increasing number of studies indicated that the three-way alignment, which is the alignment of three sequences, is able to provide additional information or a more accurate alignment result than what pair-wise alignment is able to give. In this dissertation, we focused on the investigation and application of three-way alignment algorithm. For the investigation of three-way alignment algorithm, we proposed two efficient methods, a dynamic programming-based algorithm with the variable gap penalty strategy and a linear algorithm adopting a probabilistic filtration model, to align protein and DNA sequences, respectively. For the application of three-way alignment, we applied three-way alignment to the methods and applications that originally adopt pair-wise alignment approaches. We presented a new progressive multiple sequence alignment strategy that combines pair-wise and three-way alignments to compare multiple sequences accurately. Similarly, we extended the three-way alignment algorithm to align three profiles to provide the different insight to the profile-profile alignment method. In addition, we developed a parallel algorithm for three-profile alignment to reduce the computational cost. Further, we combined the three-profile alignment approach and a voting algorithm to select the functional sites of the target protein by comparing protein superfamilies. Theoretical analysis and extensive experimental tests of the proposed methods are conducted in this dissertation. From the conducted experimental results, we got some encouraged remarks regarding to the proposed methods for sequence analysis.
    Multiple sequence alignment
    Alignment-free sequence analysis
    Structural alignment
    Smith–Waterman algorithm
    Sequence (biology)
    Citations (0)
    We present a profile-profile multiple alignment strategy that uses database searching to collect homologues for each sequence in a given set, in order to enrich their available evolutionary information for the alignment. For each of the alignment sequences, the putative homologous sequences that score above a pre-defined threshold are incorporated into a position-specific pre-alignment profile. The enriched position-specific profile is used for standard progressive alignment, thereby more accurately describing the characteristic features of the given sequence set. We show that owing to the incorporation of the pre-alignment information into a standard progressive multiple alignment routine, the alignment quality between distant sequences increases significantly and outperforms state-of-the-art methods, such as T-COFFEE and MUSCLE. We also show that although entirely sequence-based, our novel strategy is better at aligning distant sequences when compared with a recent contact-based alignment method. Therefore, our pre-alignment profile strategy should be advantageous for applications that rely on high alignment accuracy such as local structure prediction, comparative modelling and threading.
    Multiple sequence alignment
    Alignment-free sequence analysis
    Structural alignment
    Threading (protein sequence)
    Smith–Waterman algorithm
    Sequence (biology)
    Homology
    Position (finance)
    Sequence homology
    Citations (115)
    While most of the recent improvements in multiple sequence alignment accuracy are due to better use of vertical information, which include the incorporation of consistency-based pairwise alignments and the use of profile alignments, we observe that it is possible to further improve accuracy by taking into account alignment of neighboring residues when aligning two residues, thus making better use of horizontal information. By modifying existing multiple alignment algorithms to make use of horizontal information, we show that this strategy is able to consistently improve over existing algorithms on a few sets of benchmark alignments that are commonly used to measure alignment accuracy, and the average improvements in accuracy can be as much as 1–3% on protein sequence alignment and 5–10% on DNA/RNA sequence alignment. Unlike previous algorithms, consistent average improvements can be obtained across all identity levels.
    Alignment-free sequence analysis
    Multiple sequence alignment
    Structural alignment
    Benchmark (surveying)
    Sequence (biology)
    Smith–Waterman algorithm
    Citations (19)
    In recent years improvements to existing programs and the introduction of new iterative algorithms have changed the state-of-the-art in protein sequence alignment. This paper presents the first systematic study of the most commonly used alignment programs using BAliBASE benchmark alignments as test cases. Even below the 'twilight zone' at 10–20% residue identity, the best programs were capable of correctly aligning on average 47% of the residues. We show that iterative algorithms often offer improved alignment accuracy though at the expense of computation time. A notable exception was the effect of introducing a single divergent sequence into a set of closely related sequences, causing the iteration to diverge away from the best alignment. Global alignment programs generally performed better than local methods, except in the presence of large N/C-terminal extensions and internal insertions. In these cases, a local algorithm was more successful in identifying the most conserved motifs. This study enables us to propose appropriate alignment strategies, depending on the nature of a particular set of sequences. The employment of more than one program based on different alignment techniques should significantly improve the quality of automatic protein sequence alignment methods. The results also indicate guidelines for improvement of alignment algorithms.
    Alignment-free sequence analysis
    Multiple sequence alignment
    Structural alignment
    Benchmark (surveying)
    Sequence (biology)
    Smith–Waterman algorithm
    Citations (763)
    Structural alignment
    Multiple sequence alignment
    Alignment-free sequence analysis
    Threading (protein sequence)
    Sequence (biology)
    Smith–Waterman algorithm
    Sequence logo
    Distance matrix
    Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.
    Structural alignment
    Alignment-free sequence analysis
    Multiple sequence alignment
    Sequence (biology)
    Loop modeling
    Sequence logo
    Smith–Waterman algorithm
    Limiting
    Homology
    Summary: Improving and ascertaining the quality of a multiple sequence alignment is a very challenging step in protein sequence analysis. This is particularly the case when dealing with sequences in the ‘twilight zone’, i.e. sharing <30% identity. Here we describe INTERALIGN, a dedicated user-friendly alignment editor including a view of secondary structures and a synchronized display of carbon alpha traces of corresponding protein structures. Profile alignment, using CLUSTALW, is implemented to improve the alignment of a sequence of unknown structure with the visually optimized structural alignment as compared with a standard multiple sequence alignment. Tree-based ordering further helps in identifying the structure closest to a given sequence. Availability: Windows and Linux packages, as well as source files, are available under the CeCILL free software licensing agreement at the following address: http://www-dsv.cea.fr/content/cea/d_dep/d_diep/d_sbtn/download.htm Contact:olivier.pible@cea.fr
    Alignment-free sequence analysis
    Multiple sequence alignment
    Structural alignment
    Sequence (biology)
    Tree (set theory)
    Multiple sequence alignment
    Alignment-free sequence analysis
    Structural alignment
    Sequence (biology)
    Smith–Waterman algorithm