Screening model of candidate drugs for breast cancer based on ensemble learning algorithm and molecular descriptor

2023 
Breast cancer is one of the leading killers of women around the world. Finding compounds with good bioactivity, metabolic dynamics and safety, including Absorption, Distribution, Metabolism, Excretion and Toxicity (ADMET for short), is a long and challenging task in breast cancer therapy. In the paper, molecular descriptor data of compounds was analyzed by the ensemble learning algorithm, and important features were selected for the development and validation of ADMET classification models. The overall process includes data cleaning, data splitting to training and testing sets, feature selection and classification model evaluation. A Two-Level Stacking Algorithm (TLSA) based on ensemble learning is proposed for ADMET classification. Various performance measures like classification accuracy, precision, recall, confusion matrix, F1-score, Receiver Operating Characteristic (ROC) curves and the Area Under the ROC Curves (AUC) are reported to show the superiority of the proposed method as compared to different classifiers. The experimental results show that the second level algorithm for TLSA utilizes Logistic Regression is better than other classifiers for the properties of Absorption, Distribution and Excretion, with accuracy of 94.6037%, 94.9410% and 88.1956% respectively. For the properties Metabolism and Toxicity, the second level algorithm utilizes Support Vector Machine to achieve the best classification performance, with accuracy of 88.8702% and 96.7960% respectively. The results show that the proposed approach works well with the classification of compound properties and can be a good alternative for the well-known machine learning program.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []