Lead Discovery Using Stochastic Cluster Analysis (SCA): A New Method for Clustering Structurally Similar Compounds

1998 
We have developed an algorithm that clusters structural databases using topological similarity. The first step in this procedure is to identify a set of probe structures that all fall outside a defined similarity score cutoff with respect to one another. This list of probes is then used to bin the remaining compounds in the database. In the last step, some housekeeping is performed to ensure that each compound in the dataset is either a probe or is contained in one and only one bin. We have applied this clustering method to a database of ∼27 000 compounds for which we have screening level biological data. Analysis of the resulting clusters shows that clusters defined by an active probe are much more likely to contain other active compounds than clusters defined by an inactive probe. Indeed, the incidence of active compounds in bins with active probes is anywhere from 6 to 10 times greater than the incidence of active compounds in the database as a whole. This results demonstrates the power of simple two-d...
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    47
    Citations
    NaN
    KQI
    []