Novel criteria for elimination of the outliers in QSPR studies, when the ‘forward stepwise’ procedure is used

2019 
The characteristics of the proposed algorithm are (a) the use of a new formula for the quality of the QSPRs (b) the outlier (atypical) character is defined using a classic criterion (c) the condition for elimination of the outliers includes the quality of the equation (d) only ‘the most atypical’ molecule is eliminated and all calculations are automatically repeated (e) the elimination of outliers is stopped if the condition for elimination is not fulfilled or if the number of the eliminated molecules exceeds a predetermined limit. The second situation in (e) was encountered once in the four examples discussed. The number of descriptors in ‘the best’ equation and the number of outliers removed can not be a priori predicted. The text proposes also a criterion for the identification of ‘outliers for lead hopping’. There were no molecules of this type in the four examples discussed. The initial number of molecules in the calibration sets was 50, 60, 133 and 54 respectively, the number of descriptors in ‘the best’ equations was 5, 5, 9, and 9 respectively and the number of eliminated outliers was 0, 0, 8, and 6 respectively. If there were outliers, the best equation obtained in the presence of the outliers and the best equation obtained in the absence of outliers, were very different.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    47
    References
    0
    Citations
    NaN
    KQI
    []