We recently published a high quality validation set for testing conformer generators, consisting of structures from both the PDB and the CSD (Hawkins, P. C. D. et al. J. Chem. Inf. Model.2010, 50, 572.), and tested the performance of our conformer generator, OMEGA, on these sets. In the present publication, we focus on understanding the suitability of those data sets for validation and identifying and learning from OMEGA's failures. We compare, for the first time we are aware of, the coverage of the applicable property spaces between the validation data sets we used and the parent compound sets to determine if our data sets adequately sample these property spaces. We also introduce the concept of torsion fingerprinting and compare this method of dissimilation to the more traditional graph-centric diversification methods we used in our previous publication. To improve our ability to programmatically identify cases where the crystallographic conformation is not well reproduced computationally, we introduce a new metric to compare conformations, RMSTanimoto. This new metric is used alongside those from our previous publication to efficiently identify reproduction failures. We find RMSTanimoto to be particularly effective in identifying failures for the smallest molecules in our data sets. Analysis of the nature of these failures, particularly those for the CSD, sheds further light on the issue of strain in crystallographic structures. Some of the residual failure cases not resolved by simple changes in OMEGA's defaults present significant challenges to conformer generation engines like OMEGA and are a source of new avenues to further improve their performance, while others illustrate the pitfalls of validating against crystallographic ligand conformations, particularly those from the PDB.
ADVERTISEMENT RETURN TO ISSUEPREVArticleNEXTMechanism of metal-independent hydroxylation by Chromobacterium violaceum phenylalanine hydroxylaseRobert T. Carr, Shankar Balasubramanian, Paul C. D. Hawkins, and Stephen J. BenkovicCite this: Biochemistry 1995, 34, 22, 7525–7532Publication Date (Print):June 6, 1995Publication History Published online1 May 2002Published inissue 6 June 1995https://pubs.acs.org/doi/10.1021/bi00022a028https://doi.org/10.1021/bi00022a028research-articleACS PublicationsRequest reuse permissionsArticle Views289Altmetric-Citations30LEARN ABOUT THESE METRICSArticle Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated. Share Add toView InAdd Full Text with ReferenceAdd Description ExportRISCitationCitation and abstractCitation and referencesMore Options Share onFacebookTwitterWechatLinked InRedditEmail Other access optionsGet e-Alertsclose Get e-Alerts
Accurate conformations of a molecule are critical for reliable prediction of its properties, so good predictive models require good conformations. Here, we present a method for conformer sampling based on distance geometry, implemented in our conformation generator OMEGA, which we apply to both macrocycles and druglike molecules. We validate it in the usual fashion, reproducing conformations from the solid state, and compare its performance in detail to other methods. We find that OMEGA performs well on three key criteria: accuracy, speed, and ensemble size. To support our conclusions quantitatively, particularly on accuracy, we developed a workflow for method comparison that uses parameter estimation, inference from confidence intervals, classical null hypothesis significance testing, Bayesian estimation, and effect size. The workflow is designed to be robust to the highly skewed performance data often found when validating tools in computational chemistry and to provide reliable, easy to interpret results. In this workflow, we emphasize the importance of confidently distinguishing between methods, with particular reference to a priori estimation of sample size and statistical power (false negative or Type II error rate), a topic almost completely ignored hitherto in computational chemistry.
Abstract ChemInform is a weekly Abstracting Service, delivering concise information at a glance that was extracted from about 100 leading journals. To access a ChemInform Abstract of an article which was published elsewhere, please select a “Full Text” option. The original article is trackable via the “References” option.
Cognate docking has been used as a test for pose prediction quality in docking engines for decades. In this paper, we report a statistically rigorous analysis of cognate docking performance using tools in the OpenEye docking suite. We address a number of critically important aspects of the cognate docking problem that are often handled poorly: data set quality, methods of comparison of the predicted pose to the experimental pose, and data analysis. The focus of the paper lies in the third problem, extracting maximally predictive knowledge from comparison data. To this end, we present a multistage protocol for data analysis that by combining classical null-hypothesis significance testing with effect size estimation provides crucial information about quantitative differences in performance between methods as well as the probability of finding such differences in future experiments. We suggest that developers of software and users of software have different levels of interest in different parts of this protocol, with users being primarily interested in effect size estimation while developers may be most interested in statistical significance. This protocol is completely general and therefore will provide the basis for method comparisons of many different kinds.
It has been a long-held assumption in ligand-based virtual screening that the bioactive conformation of a molecule is privileged. The assumption is that superior performance in 3D searching (pharmacophores, shape similarity) should be obtained when using the bioactive conformation of a query molecule for searching for other active molecules. A parallel assumption has been that extensive sampling of the conformational space of database molecules is necessary to obtain optimal performance for 3D ligand-based screening. Both of these assumptions will be critically assessed for ligand-based virtual screening carried out in shape space.
Conformational flexibility is a major determinant of the properties of macrocycles and other drugs in beyond rule of 5 (bRo5) space. Prediction of conformations is essential for design of drugs in this space, and we have evaluated three tools for conformational sampling of a set of 10 bRo5 drugs and clinical candidates in polar and apolar environments. The distance-geometry based OMEGA was found to yield ensembles spanning larger structure and property spaces than the ensembles obtained by MOE-LowModeMD (MOE) and MacroModel (MC). Both MC and OMEGA but not MOE generated different ensembles for polar and apolar environments. All three conformational search methods generated conformers similar to the crystal structure conformers for 9 of the 10 compounds, with OMEGA performing somewhat better than MOE and MC. MOE and OMEGA found all six conformers of roxithromycin that were identified by NMR in aqueous solutions, whereas only OMEGA sampled the three conformers observed in chloroform. We suggest that characterization of conformers using molecular descriptors, e.g., the radius of gyration and polar surface area, is preferred to energy- or root-mean-square deviation-based methods for selection of biologically relevant conformers in drug discovery in bRo5 space.
Here, we present the algorithm and validation for OMEGA, a systematic, knowledge-based conformer generator. The algorithm consists of three phases: assembly of an initial 3D structure from a library of fragments; exhaustive enumeration of all rotatable torsions using values drawn from a knowledge-based list of angles, thereby generating a large set of conformations; and sampling of this set by geometric and energy criteria. Validation of conformer generators like OMEGA has often been undertaken by comparing computed conformer sets to experimental molecular conformations from crystallography, usually from the Protein Databank (PDB). Such an approach is fraught with difficulty due to the systematic problems with small molecule structures in the PDB. Methods are presented to identify a diverse set of small molecule structures from cocomplexes in the PDB that has maximal reliability. A challenging set of 197 high quality, carefully selected ligand structures from well-solved models was obtained using these methods. This set will provide a sound basis for comparison and validation of conformer generators in the future. Validation results from this set are compared to the results using structures of a set of druglike molecules extracted from the Cambridge Structural Database (CSD). OMEGA is found to perform very well in reproducing the crystallographic conformations from both these data sets using two complementary metrics of success.
Knowledge of and information about protein binding sites has become increasingly important in the drug discovery process and not just for molecular biologists [1]. By comparing binding sites within and across protein families, relevant details about the functionality and selectivity of a target protein can be extracted leading to useful insights for the development of new ligands [2]. SiteHopper provides a powerful alternative method to the traditional use of sequence alignment for this purpose. Using OpenEye's Shape [3] and Spicoli toolkits [4], SiteHopper quickly calculates a 3D shape representation of the active site colored by the chemical properties of the protein residues defining the active site. These active site representations can be rapidly aligned and assessed for shape and chemistry similarity. As SiteHopper is built on the OpenEye toolkits, it is highly flexible and customizable for a variety of end-uses. In this presentation, the methodology behind SiteHopper will be introduced and multiple relevant applications will be shown.