Machine Learning Differentiates Enzymatic and Non-Enzymatic Metals in Proteins
2021
Identifying enzyme active sites is an important task in enzymology. Metalloenzymes are 40% of all enzymes and can perform all seven classes of enzyme reactions. The active sites of metalloenzymes can be difficult to distinguish from non-enzymatic metal binding sites. Because these sites have so many physicochemical similarities, finding physicochemical features that distinguish metal binding sites with enzymatic activity from metal binding sites without enzymatic activity can elucidate what is critical to enzyme function. In this work, we develop the largest structural dataset of enzymatic and non-enzymatic metalloprotein sites to date. We then use a decision-tree ensemble machine learning model to distinguish metal ions bound to a protein as enzymatic or non-enzymatic with 92.2% precision and 90.1% recall. The success of our model scores electrostatic and pocket lining features as more important than pocket volume, despite the fact that volume is the most quantitatively different feature between enzyme and non-enzymatic sites. Finally, we find our model has overall better performance in comparison against other methods that differentiate enzymatic from non-enzymatic sequences. The ability of our model to correctly identify which metal sites are responsible for enzymatic activity could assist with identifying new enzymatic mechanisms or designing de novo enzymes.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
72
References
0
Citations
NaN
KQI