Prediction of MHC-binding peptides of flexible lengths from sequence-derived structural and physicochemical properties

2007 
Abstract Peptide binding to MHC is critical for antigen recognition by T-cells. To facilitate vaccine design, computational methods have been developed for predicting MHC-binding peptides, which achieve impressive prediction accuracies of 70–90% for binders and 40–80% for non-binders. These methods have been developed for peptides of fixed lengths, for a limited number of alleles, trained from small number of non-binders, and in some cases based straightforwardly on sequence. These limit prediction coverage and accuracy particularly for non-binders. It is desirable to explore methods that predict binders of flexible lengths from sequence-derived physicochemical properties and trained from diverse sets of non-binders. This work explores support vector machines (SVM) as such a method for developing prediction systems of 18 MHC class I and 12 class II alleles by using 4208–3252 binders and 234,333–168,793 non-binders, and evaluated by an independent set of 545–476 binders and 110,564–84,430 non-binders. Binder accuracies are 86–99% for 25 and 70–80% for 5 alleles, non-binder accuracies are 96–99% for 30 alleles. Binder accuracies are comparable and non-binder accuracies substantially improved against other results. Our method correctly predicts 73.3% of the 15 newly-published epitopes in the last 4 months of 2005. Of the 251 recently-published HLA-A*0201 non-epitopes predicted as binders by other methods, 63 are predicted as binders by our method. Screening of HIV-1 genome shows that, compared to other methods, a comparable percentage (75–100%) of its known epitopes is correctly predicted, while a lower percentage (0.01–5% for 24 and 5–8% for 6 alleles) of its constituent peptides are predicted as binders. Our software can be accessed at http://bidd.cz3.nus.edu.sg/mhc/ .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    47
    References
    57
    Citations
    NaN
    KQI
    []