IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices

1999 
Motivation: Many studies have shown that database searches using position-specific score matrices (PSSMs) or profiles as queries are more effective at identifying distant protein relationships than are searches that use simple sequences as queries. One popular program for constructing a PSSM and comparing it with a database of sequences is Position-Specific Iterated BLAST (PSIBLAST). Results: This paper describes a new software package, IMPALA, designed for the complementary procedure of comparing a single query sequence with a database of PSI-BLAST-generated PSSMs. We illustrate the use of IMPALA to search a database of PSSMs for protein folds, and one for protein domains involved in signal transduction. IMPALA’s sensitivity to distant biological relationships is very similar to that of PSI-BLAST. However, IMPALA employs a more refined analysis of statistical significance and, unlike PSI-BLAST, guarantees the output of the optimal local alignment by using the rigorous Smith‐Waterman algorithm. Also, it is considerably faster when run with a large database of PSSMs than is BLAST or PSI-BLAST when run against the complete non-redundant protein database. Availability: The IMPALA source code, the wolf1187 database, and the aravind105 database are freely available from the NCBI ftp site ncbi.nlm.nih.gov. The databases may be found in the subdirectory ftp:// ncbi.nlm.nih.gov/ pub/ impala. The source code is in ftp:// ncbi.nlm.nih.gov/ toolbox/ ncbitools. Some IMPALA executables for different implementations of UNIX are in ftp:// ncbi.nlm.nih.gov/ blast/ executables. IMPALA
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    72
    References
    283
    Citations
    NaN
    KQI
    []