Using machine learning to predict organismal growth temperatures from protein primary sequences

2019 
The link between a protein9s primary sequence and its thermal stability and temperature dependent activity is central to an understanding of protein folding, stability, and evolution. However, the relationship between primary sequence and these biochemical properties can be difficult to quantify, due to the large sequence space and complexity of protein folding. Fortunately, evolution naturally explores both sequence space and temperature space through organismal adaptation to various thermal niches. Here, we use machine learning, in the form of multilayer perceptrons, to predict the originating species9 optimal growth temperatures from a protein family9s primary sequences. Trained machine learning models outperformed linear regressions in predicting the originating species growth temperature, achieving a root mean squared error of 3.34 °C. Notably, the models are protein family specific, and the predicted organismal growth temperatures are correlated with the proteins9 temperatures for melting and optimal activity. Therefore, this method provides a new tool for quickly predicting an organism9s optimal growth temperature in silico, which can serve as a convenient proxy for protein stability and temperature dependent activity.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    79
    References
    0
    Citations
    NaN
    KQI
    []