LigEGFR: Spatial graph embedding and molecular descriptors assisted bioactivity prediction of ligand molecules for epidermal growth factor receptor on a cell line-based dataset

2020 
AO_SCPLOWBSTRACTC_SCPLOWO_ST_ABSMotivationC_ST_ABSLung cancer is a chronic non-communicable disease and is the cancer with the worlds highest incidence in the 21st century. One of the leading mechanisms underlying the development of lung cancer in nonsmokers is an amplification of the epidermal growth factor receptor (EGFR) gene. However, laboratories employing conventional processes of drug discovery and development for such targets encounter several pain-points that are cost- and time-consuming. Moreover, high failure rates are caused by efficacy and safety problems during research and development. Therefore, it is imperative to develop improved methods for drug discovery. Herein, we developed a deep learning model with spatial graph embedding and molecular descriptors based on predicting pIC50 potency estimates of small molecules and classifying hit compounds against the human epidermal growth factor receptor (LigEGFR). The model was generated with a large-scale cell line-based dataset containing broad lists of chemical features. ResultsLigEGFR outperformed baseline machine learning models for predicting pIC50. Our model was notable for higher performance in hit compound classification, compared to molecular docking and machine learning approaches. The proposed predictive model provides a powerful strategy that potentially helps researchers overcome major challenges in drug discovery and development processes, leading to a reduction of failure to discover novel hit compounds. AvailabilityWe provide an online prediction platform and the source code that are freely available at https://ligegfr.vistec.ist, and https://github.com/scads-biochem/LigEGFR, respectively. Key pointsO_LILigEGFR is a regression model for predicting pIC50 that was developed for the human EGFR target. It can also be applied to hit compound classification (pIC50 [≥] 6) and has a higher performance than baseline machine learning algorithms and molecular docking approaches. C_LIO_LIOur spatial graph embedding and molecular descriptors based approach notably exhibited a high performance in predicting pIC50 of small molecules against human EGFR. C_LIO_LINon-hashed and hashed molecular descriptors were revealed to have the highest predictive performance by using in a convolutional layers and a fully connected layers, respectively. C_LIO_LIOur model used a large-scale and non-redundant dataset to enhance the diversity of the small molecules. The model showed robustness and reliability, which was evaluated by y-randomization and applicability domain analysis (ADAN), respectively. C_LIO_LIWe developed a user-friendly online platform to predict pIC50 of small molecules and classify the hit compounds for the drug discovery process of the EGFR target. C_LI
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    59
    References
    0
    Citations
    NaN
    KQI
    []