Vibrio parahaemolyticus causes serious seafood-borne gastroenteritis and death in humans. Raw seafood is often subjected to post-harvest processing and low-temperature storage. To date, very little information is available regarding the biological functions of cold shock proteins (CSPs) in the low-temperature survival of the bacterium. In this study, we determined the complete genome sequence of V. parahaemolyticus CHN25 (serotype: O5:KUT). The two main CSP-encoding genes (VpacspA and VpacspD) were deleted from the bacterial genome, and comparative transcriptomic analysis between the mutant and wild-type strains was performed to dissect the possible molecular mechanisms that underlie low-temperature adaptation by V. parahaemolyticus. The 5,443,401-bp V. parahaemolyticus CHN25 genome (45.2% G + C) consisted of two circular chromosomes and three plasmids with 4,724 predicted protein-encoding genes. One dual-gene and two single-gene deletion mutants were generated for VpacspA and VpacspD by homologous recombination. The growth of the ΔVpacspA mutant was strongly inhibited at 10 °C, whereas the VpacspD gene deletion strongly stimulated bacterial growth at this low temperature compared with the wild-type strain. The complementary phenotypes were observed in the reverse mutants (ΔVpacspA-com, and ΔVpacspD-com). The transcriptome data revealed that 12.4% of the expressed genes in V. parahaemolyticus CHN25 were significantly altered in the ΔVpacspA mutant when it was grown at 10 °C. These included genes that were involved in amino acid degradation, secretion systems, sulphur metabolism and glycerophospholipid metabolism along with ATP-binding cassette transporters. However, a low temperature elicited significant expression changes for 10.0% of the genes in the ΔVpacspD mutant, including those involved in the phosphotransferase system and in the metabolism of nitrogen and amino acids. The major metabolic pathways that were altered by the dual-gene deletion mutant (ΔVpacspAD) radically differed from those that were altered by single-gene mutants. Comparison of the transcriptome profiles further revealed numerous differentially expressed genes that were shared among the three mutants and regulators that were specifically, coordinately or antagonistically modulated by VpaCspA and VpaCspD. Our data also revealed several possible molecular coping strategies for low-temperature adaptation by the bacterium. This study is the first to describe the complete genome sequence of V. parahaemolyticus (serotype: O5:KUT). The gene deletions, complementary insertions, and comparative transcriptomics demonstrate that VpaCspA is a primary CSP in the bacterium, while VpaCspD functions as a growth inhibitor at 10 °C. These results have improved our understanding of the genetic basis for low-temperature survival by the most common seafood-borne pathogen worldwide.
The identification of druggable proteins has always been the core of drug development. Traditional structure-based identification methods are time-consuming and costly. As a result, more and more researchers have shifted their attention to sequence-based methods for identifying druggable proteins. We propose a sequence-based druggable protein identification model called DrugFinder. The model extracts the features from the embedding output of the pre-trained protein model Prot_T5_Xl_Uniref50 (T5) and the evolutionary information of the position-specific scoring matrix (PSSM). Afterwards, to remove redundant features and improve model performance, we used the random forest (RF) method to select features, and the selected features were trained and tested on multiple different machine learning classifiers, including support vector machines (SVM), RF, naive Bayes (NB), extreme gradient boosting (XGB), and k-nearest neighbors (KNN). Among these classifiers, the XGB model achieved the best results. DrugFinder reached an accuracy of 94.98%, sensitivity of 96.33% and specificity of 96.83% on the independent test set, which is much better than the results from existing identification methods. Our model also performed well on another additional test set related to tumors, achieving an accuracy of 88.71% and precision of 93.72%. This further demonstrates the strong generalization capability of the model.
To reveal the working pattern of programmed cell death, knowledge of the subcellular location of apoptosis proteins is essential. Besides the costly and time-consuming method of experimental determination, research into computational locating schemes, focusing mainly on the innovation of representation techniques on protein sequences and the selection of classification algorithms, has become popular in recent decades. In this study, a novel tri-gram encoding model is proposed, which is based on using the protein overlapping property matrix (POPM) for predicting apoptosis protein subcellular location. Next, a 1000-dimensional feature vector is built to represent a protein. Finally, with the help of support vector machine-recursive feature elimination (SVM-RFE), we select the optimal features and put them into a support vector machine (SVM) classifier for predictions. The results of jackknife tests on two benchmark datasets demonstrate that our proposed method can achieve satisfactory prediction performance level with less computing capacity required and could work as a promising tool to predict the subcellular locations of apoptosis proteins.
Computational prediction of protein structural class based on sequence data remains a challenging problem in current protein science. In this paper, a new feature extraction approach based on relative polypeptide composition is introduced. This approach could take into account the background distribution of a given k-mer under a Markov model of order k-2, and avoid the curse of dimensionality with the increase of k by using a T-statistic feature selection strategy. The selected features are then fed to a support vector machine to perform the prediction. To verify the performance of our method, jackknife cross-validation tests are performed on four widely used benchmark datasets. Comparison of our results with existing methods shows that our method provides satisfactory performance for structural class prediction. Keywords: Markov model, protein structural class, relative polypeptide composition, support vector machine, T-statistic