A Computational Domain-Based Feature Grouping Approach for Prediction of Stability of SCF Ligases
2015
Analyzing the stability of SCF ubiquitin ligases is worth investigating because these complexes are involved in many cellular processes including cell cycle regulation, DNA repair mechanisms, and gene expression. On the other hand, interactions of two (or more) proteins are controlled by their domains – compact functional units of proteins. As a consequence, in this study, we have analyzed the role of Pfam domain interactions in predicting the stability of protein-protein interactions (PPIs) that are known or predicted to occur involving subunit components of the SCF ligase complex. Moreover, employing the most relevant and discriminating features is very important to achieve a successful prediction with low computational cost. Although, different feature selection methods have been recently developed for this purpose, feature grouping is a better idea, especially when dealing with high-dimensional sparse feature vectors, yielding better interpretation of the data. In this paper, a correlation-based feature grouping (CFG) method is proposed to group and combine the features. To demonstrate the strength of CFG, two filter methods of χ 2 and correlation are also employed for feature selection and prediction is performed using different methods including a support vector machine (SVM) and k-Nearest Neighbor (k-NN). The experimental results on a dataset of SCF ligases indicate that employing feature grouping achieves significant increases of 10% for svm and 13% for k-NN, being more efficient than employing feature selection in identifying a set of relevant features
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
32
References
1
Citations
NaN
KQI