Attachment of a myristoyl group to NH2-terminus of a nascent protein among protein post-translational modification (PTM) is called myristoylation.The myristate moiety of proteins plays an important role for their biological functions, such as regulation of membrane binding (HIV-1 Gag) and enzyme activity (AMPK).Several predictors based on protein sequences alone are hitherto proposed.However, they produce a great number of false positive and false negative predictions; or they cannot be used for general purpose (i.e., taxon-specific); or threshold values of the decision rule of predictors need to be selected with cautiousness.Here, we present novel and taxon-free predictors based on protein primary structure.To identify myristoylated proteins accurately, we employ a widely used machinelearning algorithm, support vector machine (SVM).A series of SVM predictors are developed in the present study where various scales representing physicochemical and biological properties of amino acids (from the AAindex database) are used for numerical transformation of protein sequences.Of the predictors, the top ten achieve accuracies of >98% (the average value is 98.34%), and also the area under the ROC curve (AUC) values of >0.98.Compared with those of previous studies, the prediction accuracies are improved by about 3 to 4%.
Although over 300 million protein sequences are registered in a reference sequence database, only 0.2% have experimentally determined functions. This suggests that many valuable proteins, potentially catalyzing novel enzymatic reactions, remain undiscovered among the vast number of function-unknown proteins. In this study, we developed a method to predict whether two proteins catalyze the same enzymatic reaction by analyzing sequence and structural similarities, utilizing structural models predicted by AlphaFold2. We performed pocket detection and domain decomposition for each structural model. The similarity between protein pairs was assessed using features such as full-length sequence similarity, domain structural similarity, and pocket similarity. We developed several models using conventional machine learning algorithms and found that the LightGBM-based model outperformed the models. Our method also surpassed existing approaches, including those based solely on full-length sequence similarity and state-of-the-art deep learning models. Feature importance analysis revealed that domain sequence identity, calculated through structural alignment, had the greatest influence on the prediction. Therefore, our findings demonstrate that integrating sequence and structural information improves the accuracy of protein function prediction.