Neighborhood Complex Based Machine Learning (NCML) Models for Drug Design

2021 
The importance of drug design cannot be overemphasized. Recently, artificial intelligence (AI) based drug design has begun to gain momentum due to the great advancement in experimental data, computational power and learning models. However, a major issue remains for all AI-based learning models is efficient molecular representations. Here we propose Neighborhood complex (NC) based molecular featurization (or feature engineering), for the first time. In particular, we reveal deep connections between NC and Dowker complex (DC) for molecular interaction based bipartite graphs, for the first time. Further, NC-based persistent spectral models are developed and the associated persistent attributes are used as molecular descriptors or fingerprints. To test our models, we consider protein-ligand binding affinity prediction. Our NC-based machine learning (NCML) models, in particular, NC-based gradient boosting tree (NC-GBT), are tested on three most-commonly used datasets, i.e., including PDBbind-v2007, PDBbind-v2013 and PDBbind-v2016, and extensively compared with other existing state-of-the-art models. It has been found that our NCML models can achieve state-of-the-art results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    0
    Citations
    NaN
    KQI
    []