Machine Learning Approach Effectively Predicts Binding Between SARS-CoV-2 Spike and ACE2 Across Mammalian Species — Worldwide, 2021

2021 
Introduction Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a recently emergent coronavirus of natural origin and caused the coronavirus disease (COVID-19) pandemic. The study of its natural origin and host range is of particular importance for source tracing, monitoring of this virus, and prevention of recurrent infections. One major approach is to test the binding ability of the viral receptor gene ACE2 from various hosts to SARS-CoV-2 spike protein, but it is time-consuming and labor-intensive to cover a large collection of species. Methods In this paper, we applied state-of-the-art machine learning approaches and created a pipeline reaching >87% accuracy in predicting binding between different ACE2 and SARS-CoV-2 spike. Results We further validated our prediction pipeline using 2 independent test sets involving >50 bat species and achieved >78% accuracy. A large-scale screening of 204 mammal species revealed 144 species (or 61%) were susceptible to SARS-CoV-2 infections, highlighting the importance of intensive monitoring and studies in mammalian species. Discussion In short, our study employed machine learning models to create an important tool for predicting potential hosts of SARS-CoV-2 and achieved the highest precision to our knowledge in experimental validation. This study also predicted that a wide range of mammals were capable of being infected by SARS-CoV-2.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []