Searching for remotely homologous sequences in protein databases with hybrid PSI-blast

2006 
Sequence alignment is one of the fundamental techniques used in molecular biology. It has been widely used in many biological applications, such as protein classification, gene finding, homology modeling, structure and function prediction, phylogenetic analysis and database annotation. In high sensitivity sequence homology database searches, progressive sequence model refinement by means of iterative searches is an effective method and is currently employed in many popular tools such as PSI-BLAST and SAM. Recently, a novel alignment algorithm has been proposed that offers features expected to improve the sensitivity of such iterative approaches, specifically a well-characterized theory of its statistics even in the presence of position-specific gap costs. We have demonstrated that the new hybrid alignment algorithm is ready to be used as the alignment core of PSI-BLAST. We also evaluated the accuracy of two proposed approaches to edge effect correction in short sequence alignment statistics that turns out to be one of the crucial issues in developing a hybrid-alignment based version of PSI-BLAST. In addition, we have exploited other benefits of the hybrid alignment. We show that incorporating information about the suboptimal alignments, otherwise ignored in PSI-BLAST, already improves the sensitivity of PSI-BLAST. In one experiment, we have found a set of sequences on which our tool disagrees with the classification given by SCOP. Careful examination points to a possible misclassification in SCOP. Cross-referencing with two other methods of
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    49
    References
    1
    Citations
    NaN
    KQI
    []