Background:Humans are exposed to thousands of man-made chemicals in the environment. Some chemicals mimic natural endocrine hormones and, thus, have the potential to be endocrine disruptors. Most of these chemicals have never been tested for their ability to interact with the estrogen receptor (ER). Risk assessors need tools to prioritize chemicals for evaluation in costly in vivo tests, for instance, within the U.S. EPA Endocrine Disruptor Screening Program.Objectives:We describe a large-scale modeling project called CERAPP (Collaborative Estrogen Receptor Activity Prediction Project) and demonstrate the efficacy of using predictive computational models trained on high-throughput screening data to evaluate thousands of chemicals for ER-related activity and prioritize them for further testing.Methods:CERAPP combined multiple models developed in collaboration with 17 groups in the United States and Europe to predict ER activity of a common set of 32,464 chemical structures. Quantitative structure–activity relationship models and docking approaches were employed, mostly using a common training set of 1,677 chemical structures provided by the U.S. EPA, to build a total of 40 categorical and 8 continuous models for binding, agonist, and antagonist ER activity. All predictions were evaluated on a set of 7,522 chemicals curated from the literature. To overcome the limitations of single models, a consensus was built by weighting models on scores based on their evaluated accuracies.Results:Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. Out of the 32,464 chemicals, the consensus model predicted 4,001 chemicals (12.3%) as high priority actives and 6,742 potential actives (20.8%) to be considered for further testing.Conclusion:This project demonstrated the possibility to screen large libraries of chemicals using a consensus of different in silico approaches. This concept will be applied in future projects related to other end points.Citation:Mansouri K, Abdelaziz A, Rybacka A, Roncaglioni A, Tropsha A, Varnek A, Zakharov A, Worth A, Richard AM, Grulke CM, Trisciuzzi D, Fourches D, Horvath D, Benfenati E, Muratov E, Wedebye EB, Grisoni F, Mangiatordi GF, Incisivo GM, Hong H, Ng HW, Tetko IV, Balabin I, Kancherla J, Shen J, Burton J, Nicklaus M, Cassotti M, Nikolov NG, Nicolotti O, Andersson PL, Zang Q, Politi R, Beger RD, Todeschini R, Huang R, Farag S, Rosenberg SA, Slavov S, Hu X, Judson RS. 2016. CERAPP: Collaborative Estrogen Receptor Activity Prediction Project. Environ Health Perspect 124:1023–1033; http://dx.doi.org/10.1289/ehp.1510267
Computational methods are increasingly used to streamline and enhance the lead discovery and optimization process. However, accurate prediction of absorption, distribution, metabolism and excretion (ADME) and adverse drug reactions (ADR) is often difficult, due to the complexity of underlying physiological mechanisms. Modeling approaches have been hampered by the lack of large, robust and standardized training datasets. In an extensive effort to build such a dataset, the BioPrint database was constructed by systematic profiling of nearly all drugs available on the market, as well as numerous reference compounds. The database is composed of several large datasets: compound structures and molecular descriptors, in vitro ADME and pharmacology profiles, and complementary clinical data including therapeutic use information, pharmacokinetics profiles and ADR profiles. These data have allowed the development of computational tools designed to integrate a program of computational chemistry into library design and lead development. Models based on chemical structure are strengthened by in vitro results that can be used as additional compound descriptors to predict complex in vivo endpoints. The BioPrint pharmacoinformatics platform represents a systematic effort to accelerate the process of drug discovery, improve quantitative structure-activity relationships and develop in vitro/in vivo associations. In this review, we will discuss the importance of training set size and diversity in model development, the implementation of linear and neighborhood modeling approaches, and the use of in silico methods to predict potential clinical liabilities.
Microtubules are highly dynamic polymers of α,β-tubulin dimers which play an essential role in numerous cellular processes such as cell proliferation and intracellular transport, making them an attractive target for cancer and neurodegeneration research. To date, a large number of known tubulin binders were derived from natural products, while only one was developed by rational structure-based drug design. Several of these tubulin binders show promising in vitro profiles while presenting unacceptable off-target effects when tested in patients. Therefore, there is a continuing demand for the discovery of safer and more efficient tubulin-targeting agents. Since tubulin structural data is readily available, the employment of computer-aided design techniques can be a key element to focus on the relevant chemical space and guide the design process. Due to the high diversity and quantity of structural data available, we compiled here a guide to the accessible tubulin-ligand structures. Furthermore, we review different ligand and structure-based methods recently used for the successful selection and design of new tubulin-targeting agents.
The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availability of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against COVID-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.
The European REACH (Registration, Evaluation, Authorization and restriction of Chemicals) Regulation, requires marketed chemicals to be evaluated for Ready Biodegradability (RB). In-silico prediction is a valid alternative to expensive and time-consuming experimental testing. However, currently available models may not be relevant to predict compounds of industrial interest, due to accuracy and applicability domain restriction issues. In this work we present a new and extended RB dataset (2830 compounds), issued by the merging of several public data sources. It was used to train classification models, which were externally validated and benchmarked against already-existing tools on a set of 316 compounds coming from the industrial context. New models showed good performances in terms of predictive power (BA = 0.74 – 0.79) and data coverage (83 – 91 %). The Generative Topographic Mapping approach was employed to compare the chemical space of the various data sources: several chemotypes and structural motifs unique to the industrial dataset were identified, highlighting for which chemical classes currently available models may have less reliable predictions. Finally, public and industrial data were merged into Global dataset containing 3146 compounds and including a significant subset of compounds coming from the industrial context. This is the biggest dataset reported in the literature so far which covers some chemotypes absent in the public data. Thus, predictive model developed on the Global dataset has much larger applicability domain than related models built on publicly available data. The developed model is available for the user on the Laboratory of Chemoinformatics website. This dataset is only the "All-Public" set, since the industrial compounds cannot be disclosed. This update contains additional entries from [J. Chem. Inf. Model. 52 (2012), pp. 655–669] and [J. Chem. Inf. Model. 53 (2013), pp. 867–878]
Abstract ChemInform is a weekly Abstracting Service, delivering concise information at a glance that was extracted from about 100 leading journals. To access a ChemInform Abstract of an article which was published elsewhere, please select a “Full Text” option. The original article is trackable via the “References” option.
Combinatorial chemistry, a major drug discovery tool, heavily relies on chemoinformatics ( 1 ,2) and molecular modeling to manage the huge flux of structural information related to potentially feasible combinatorial products, and to intelligently direct synthesis efforts toward products with a maximal chance of fulfilling the stringent conditions required of a drug molecule. Until recently, even the numberof combinatorial products that potentially could have been obtained on the basis of commercially available starting materials and relatively simple two-or three-step chemistries would have largely exceeded the available modeling capacities. In response to these novel constraints, molecular modeling tools dedicated to combinatorial chemistry (3-5) have been successfully developed. Soft-ware packages aimed at processing large sets of molecules are nevertheless restricted to the fast bidimensional (topological) (6-8) description of combinatorial products, thus avoiding the computational effort due to geometry buildup and conformational sampling. Conformer generation may require seconds to minutes of CPU time per molecule, depending on the effort spent to score the relative relevance of the visited phase space region (using a simple bump check criterion toreject impossible geometries vs. performing a full-blown potential energy evaluation). Therefore, 3D descriptors may be routinely used to characterize libraries containing l