Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties

2007 
Abstract Background Computational methods have been developed for predicting allergen proteins from sequence segments that show identity, homology, or motif match to a known allergen. These methods achieve good prediction accuracies, but are less effective for novel proteins with no similarity to any known allergen. Methods This work tests the feasibility of using a statistical learning method, support vector machines, as such a method. The prediction system is trained and tested by using 1005 allergen proteins from the Allergome database and 22,469 non-allergen proteins from 7871 Pfam families. Results Testing results by an independent set of 229 allergen and 6717 non-allergen proteins from 7871 Pfam families show that 93.0% and 99.9% of these are correctly predicted, which are comparable to the best results of other methods. Of the 18 novel allergen proteins non-homologous to any other proteins in the Swissprot database, 88.9% is correctly predicted. A further screening of 168,128 proteins in the Swissprot database finds that 2.9% of the proteins are predicted as allergen proteins, which is consistent with the estimated numbers from motif-based methods. Conclusions Our study suggests that SVM is a potentially useful method for predicting allergen proteins and it has certain capability for predicting novel allergen proteins. Our software can be accessed at http://jing.cz3.nus.edu.sg/cgi-bin/APPEL .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    64
    Citations
    NaN
    KQI
    []