Feature selection based on graph Laplacian by using compounds with known and unknown activities

2017 
A semisupervised feature selection method based on graph Laplacian (S2FSGL) was proposed for quantitative structure-activity relationship (QSAR) models, which uses an l2,1-norm and compounds with both known and unknown activities. In the proposed S2FSGL method, 2 graphs Gunsup and Gsup are constructed. It uses the label information of compounds with known activities and the local structure of compounds with known and unknown activities to select the most important descriptors. The weight matrix of graph Gunsup models the local structure of the compounds with known and unknown activities. The S2FSGL method uses the l2,1-norm to consider the correlation between different descriptors when conducting descriptor selection. The performance of the proposed S2FSGL coupled with a kernel smoother model was evaluated using 2 QSAR data sets and compared with the performance of other feature selection methods. For the evaluation of the performance of QSAR models and selected descriptors, several different training and test sets were produced for each data set. The comparison between the statistical parameters of QSAR models built based on the semisupervised feature selection method and those obtained by other feature selection methods revealed the superiority of the proposed S2FSGL in selecting the most relevant descriptors. The results showed that the use of compounds with unknown activities beside compounds with known activities can be helpful in selecting the relevant descriptors of QSAR models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    8
    Citations
    NaN
    KQI
    []