Prediction of Protein-protein Interactions Based on Feature Selection and Data Balancing

2013 
Computational approaches are able to analyze protein-protein interactions (PPIs) from a different angle of view by complementing the experimental ones. And they are very efficient in determining whether two proteins can interact with each other. In this paper, KNNs (K-nearest neighbors) is applied to predict the PPIs by coding each protein with the physical and chemical properties of its residues, predicted secondary structures and amino acid compositions. mRMR (minimum-redundancy maximum-relevance) feature selection is adopted to select a compact feature set, features of which are considered to be important for the determination of PPI-nesses. Because the size of the negative dataset (containing non-interactive protein pairs) is much larger than that of the positive dataset (containing interactive protein pairs), the negative dataset is divided into 5 portions and each portion is combined with the positive dataset for one prediction. Thus 5 predictions are performed and the final results are obtained through voting. As a result, the prediction achieves an overall accuracy of 0.8369 with sensitivity of 0.7356. The predictor, developed by this research for the prediction of the fruit fly PPI-nesses, is available for public use at http://chemdata.shu.edu.cn/ppip.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    2
    Citations
    NaN
    KQI
    []