Prediction of human disease-specific phosphorylation sites with combined feature selection approach and support vector machine

2014 
Phosphorylation is a crucial post translational modification, which regulates almost all cellular process in life. It has long been recognized that protein phosphorylation has close relationship with diseases, and therefore many researches are undertaken to predict phosphorylation sites for disease treatment and drug design. However, despite the success achieved by these approaches, no method focuses on disease-associated phosphorylation sites prediction. Herein, for the first time we propose a novel approach that is specially designed to identify disease-specific phosphorylation sites based on SVM. Human disease-associated phosphorylation data is extracted from PhosphoSitePlus database and local sequences are derived for training. To take full advantage of sequence information, a combined feature selection method-based SVM (CFS-SVM) that incorporates mRMR filtering process and forward feature selection process is developed. With CFS-SVM, we successfully predict disease-specific phosphorylation sites. Performance evaluation shows that CFS-SVM is significantly better than the widely used classifiers, including Bayesian decision theory and k nearest neighbour. With the extremely high specificity of 99%, CFS-SVM can still achieve a high sensitivity. Besides, the analysis of corresponding kinases and selected features also shed light on understanding of the potential mechanism of disease-phosphorylation relationships and guide further experimental validations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    1
    Citations
    NaN
    KQI
    []