Prediction of human phosphorylated proteins by extracting multi-perspective discriminative features from the evolutionary profile and physicochemical properties through LFDA

2020 
Abstract Protein phosphorylation is an emerging post-translational modification, which critically involved in the intracellular process of the human body by controlling diverse functions ranging from cell growth to metabolism. The existing experimental methods for identifying phosphorylated proteins are overpriced and resource-intensive; thus, it is necessary to develop a fast and accurate computational method to address the problem. Here we report a novel predictor HPhosPPred, a phosphorylated protein prediction method that is incorporating highly discriminative evolutionary and physicochemical information conserved in protein primary motifs, namely pseudo-position specific scoring matrix, the auto-covariance transformation of the position-specific scoring matrix and normalized moreau-broto auto-correlation. Further, to boost up the generalization capability of HPhosPPred, we used local fisher discriminant analysis as a dominant feature selection strategy for eliminating redundant and noise patterns from the extracted features. Finally, the optimized features feed to support vector machine with radial basis function kernel to predict phosphorylated proteins. As evident from the results, the proposed method achieved promising performance with an accuracy of 80.68%, sensitivity of 84.63%, specificity of 73.67%, and Matthew's correlation coefficient of 0.581 using rigorous leave-one-out-cross-validation test and 10-fold cross-validation test. The empirical outcomes demonstrate that the developed model outperformed the existing state-of-the-art methods. Furthermore, our analysis reveals that the proposed tool can help detect unseen phosphorylated proteins in particular and proteomics research in general. The source code and dataset are publicly available at https://github.com/saeed344/HPhosPPred .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    78
    References
    2
    Citations
    NaN
    KQI
    []