Application of PVM to protein homology search

2006 
Although there are many computer programmes currently available for searching homologous proteins in large databases, none is considered satisfactory for both speed and sensitivity at the same time. It has been known that a very sensitive programme could be written using the algorithm of Needleman and Wunsch [1]. This algorithm first calculates the maximum match score of two protein sequences on a two-dimensional array, MAT(m,n), where m and n are the lengths of the two sequences (the average length is 364 amino acids in the Swiss-Prot database [2]). The similarity or homology between the two sequences is then assessed statistically by comparing the score from the real sequences and the mean score from a large number (>200) of pairs of random sequences that are produced by scrambling each of the original sequences. Homology search using this algorithm means that this statistical analysis must be carried out between the query sequence and every sequence in the database sequentially. Consequently, as the size of database increases – the well-known TrEMBL database now contains over 2,500,000 protein sequences (about 962 Mbytes), homology search by this method becomes very time consuming.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    0
    Citations
    NaN
    KQI
    []