Modeling and predicting chiral stationary phase enantioselectivity: An efficient random forest classifier using an optimally balanced training dataset and an aggregation strategy

2018 
: Predicting whether a chiral column will be effective is a daily task for many analysts. Moreover, finding the best chiral column for separating a particular racemic compound is mostly a matter of trial and error that may take up to a week in some cases. In this study we have developed a novel prediction approach based on combining a random forest classifier and an optimized discretization method for dealing with enantioselectivity as a continuous variable. Using the optimization results, models were trained on data sets divided into four enantioselectivity classes. The best model performances were achieved by over-sampling the minority classes (α ≤ 1.10 and α ≥ 2.00), down-sampling the majority class (1.2 ≤ α < 2.0), and aggregating multicategory predictions into binary classifications. We tested our method on 41 chiral stationary phases using layered fingerprints as descriptors. Experimental results show that this learning methodology was successful in terms of average area under the Receiver Operating Characteristic curve, Kappa indices and F-measure for structure-based prediction of the enantioselective behavior of 34 chiral columns.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    10
    Citations
    NaN
    KQI
    []