Predicting DNA-Binding Residues of Proteins Using Random Forest and Evolutionary Information Combined with Conservation Information

2011 
Protein-DNA interactions play pivotal role in a variety of biological processes in cells. In this research, a novel prediction model is proposed for predicting DNA-binding residues from amino acids sequences using a variety of features from amino acid sequence information with random forest (RF) algorithm. A novel feature, named position specific scoring matrix combing with physicochemical properties (PSSM-PP), is proposed to represent the conservation information of physicochemical properties of residues. Then the novel feature, orthogonal binary vectors and the secondary structure information are used to establish the RF model for prediction of DNA-binding residues in protein and the prediction classifier achieves 0.6814 Matthew's correlation coefficient (MCC) and 90.23% overall accuracy (ACC) with 77.21% sensitivity (SE) and 91.49% specificity respectively. Further analysis proves that PSSM-PP feature contributes most to the prediction improvement. The results obtained from the comparisons with previous works obviously show that the RF prediction model has successful performance for prediction of DNA-binding residues in novel proteins.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []