A scalable algorithm for molecular property estimation in high dimensional scaffold-based libraries

2012 
An algorithm is presented for the estimation of molecular properties over a library built around a scaffold, which has N sites for functionalization with Mi moieties at the ith scaffold site, corresponding to a library of \({\prod_{i=1}^N M_i}\) molecules. The algorithm relies on a series of operations involving (i) synthesis and property measurement of a minimal number of T randomly sampled members of the library, (ii) expression of the observed property in terms of a high-dimensional model representation (HDMR) of the moiety → property map, (iii) optimization of the ordered sequence of moieties on each site to regularize the HDMR map and (iv) interpolation using the map to estimate the properties of as yet unsynthesized compounds. The set of operations is performed iteratively aiming to reach convergence of the predictive HDMR map with as few synthesized samples as possible. Through simulation, the number T of required random molecular samples is shown to scale very favorably with \({T < < \prod^N_{i=1} M_i}\) for cases up to N = 20 and Mi = 20. For example, high estimation quality was attained for simulated libraries with T ~ 5,000 sampled compounds for a library of 2012 members and T ~ 12,500 sampled compounds for a library of 2020 members. The algorithm is based on the assumption that a systematic pattern exists in the moiety → property map provided that the moieties are optimally ordered on the scaffold sites within the context of HDMR. The overall procedure is referred to as the substituent reordering HDMR algorithm (SR-HDMR). The technique was also successfully tested with laboratory data for estimating C13-NMR shifts in a tri-substituted benzene library and for lac operon repression binding.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    5
    Citations
    NaN
    KQI
    []