This repository contains datasets for the manuscript "Practical model selection for prospective virtual screening": pria_rmi_cv.tar.gz: A compressed directory containing chemical screening data for the PriA-SSB AS, PriA-SSB FP, and RMI-FANCM FP binary datasets. The files also contain the associated continuous % inhibition values and chemical features represented as SMILES and Morgan fingerprints. The dataset has been split into five folds for cross validation. pria_rmi_pcba_cv.tar.gz: A compressed directory containing chemical screening data for the PriA-SSB AS, PriA-SSB FP, and RMI-FANCM FP binary datasets as well as public PubChem BioAssay datasets. The files also contain the PriA-SSB and RMI-FANCM continuous % inhibition values and chemical features represented as SMILES and Morgan fingerprints. The dataset has been split into five folds for cross validation. Missing values are left blank. pria_prospective.csv.gz: A compressed file containing chemical screening data for the binary dataset PriA-SSB prospective. The file also contains the continuous % inhibition values and chemical features represented as SMILES and Morgan fingerprints. If you use these data in a publication, please cite: Shengchao Liu+, Moayad Alnammi+, Spencer S. Ericksen, Andrew F. Voter, Gene E. Ananiev, James L. Keck, F. Michael Hoffmann, Scott A. Wildman, Anthony Gitter. Practical Model Selection for Prospective Virtual Screening. Journal of Chemical Information and Modeling. 2018 doi:10.1021/acs.jcim.8b00363 PubChem data were provided by the PubChem database. Follow the PubChem citation guidelines if you use the PubChem data. See Voter et al. 2017 (PubChem AID 1272365) for the PriA-SSB screening data and Voter et al. 2016 (PubChem AID 1159607) for RMI-FANCM. Version 1.1.0 updates all of the data files. We standardized the SMILES in all files by generating canonical SMILES with RDKit version 2016.03.4. In addition, we removed 2845 chemicals from pria_prospective.csv.gz that were duplicates of compounds in pria_rmi_cv.tar.gz.
<div>Abstract<p>The MAGE-A, MAGE-B, and MAGE-C protein families comprise the class-I MAGE/cancer testes antigens, a group of highly homologous proteins whose expression is suppressed in all normal tissues except developing sperm. Aberrant expression of class I MAGE proteins occurs in melanomas and many other malignancies, and MAGE proteins have long been recognized as tumor-specific targets; however, their functions have largely been unknown. Here, we show that suppression of class I MAGE proteins induces apoptosis in the Hs-294T, A375, and S91 MAGE-positive melanoma cell lines and that members of all three families of MAGE class I proteins form complexes with KAP1, a scaffolding protein that is known as a corepressor of p53 expression and function. In addition to inducing apoptosis, MAGE suppression decreases KAP1 complexing with p53, increases immunoreactive and acetylated p53, and activates a p53 responsive reporter gene. Suppression of class I MAGE proteins also induces apoptosis in MAGE-A–positive, p53<sup>wt/wt</sup> parental HCT 116 colon cancer cells but not in a MAGE-A–positive HCT 116 p53<sup>−/−</sup> variant, indicating that MAGE suppression of apoptosis requires p53. Finally, treatment with MAGE-specific small interfering RNA suppresses S91 melanoma growth <i>in vivo</i>, in syngenic DBA2 mice. Thus, class I MAGE protein expression may suppress apoptosis by suppressing p53 and may actively contribute to the development of malignancies and by promoting tumor survival. Because the expression of class I MAGE proteins is limited in normal tissues, inhibition of MAGE antigen expression or function represents a novel and specific treatment for melanoma and diverse malignancies. [Cancer Res 2007;67(20):9954–62]</p></div>
Intimal hyperplasia is the cause of the recurrent occlusive vascular disease (restenosis). Drugs currently used to treat restenosis effectively inhibit smooth muscle cell (SMC) proliferation, but also inhibit the growth of the protective luminal endothelial cell (EC) lining, leading to thrombosis. To identify compounds that selectively inhibit SMC versus EC proliferation, we have developed a high-throughput screening (HTS) format using human cells and have employed this to screen a multiple compound collection (NIH Clinical Collection). We developed an automated, accurate proliferation assay in 96-well plates using human aortic SMCs and ECs. Using this HTS format we screened a 447-drug NIH Clinical Library. We identified 11 compounds that inhibited SMC proliferation greater than 50%, among which idarubicin exhibited a unique feature of preferentially inhibiting SMC versus EC proliferation. Concentration-response analysis revealed this differential effect most evident over an ∼10 nM-5 µM window. In vivo testing of idarubicin in a rat carotid injury model at 14 days revealed an 80% reduction of intimal hyperplasia and a 45% increase of lumen size with no significant effect on re-endothelialization. Taken together, we have established a HTS assay of human vascular cell proliferation, and identified idarubicin as a selective inhibitor of SMC versus EC proliferation both in vitro and in vivo. Screening of larger and more diverse compound libraries may lead to the discovery of next-generation therapeutics that can inhibit intima hyperplasia without impairing re-endothelialization.
A thermally induced change in the vibrational properties of a coadsorbed oxygen-water overlayer on Ru(001) is attributed to the formation of a local O-${\mathrm{H}}_{2}$O complex. An O-H stretching mode of this complex is observed in electron-energy-loss data, but not in infrared reflection-absorption data available from another laboratory. This provides the first direct experimental evidence of a case in which the surface dipolar selection rule applies in an infrared but not in an energy-loss measurement.
We report measurements of synchrotron radiation emission in the wavelength region from 30 to 400 \ensuremath{\mu}m where coherent enhancement was predicted. The power ratio relative to a blackbody source exhibits no such enhancement and can be accounted for by the incoherent emission theory. This discrepancy is explained by showing that the coherent enhancement is reduced when proper account is taken of the relative phases of all emitting particles.
Traditional small-molecule drug discovery is a time-consuming and costly endeavor. High-throughput chemical screening can only assess a tiny fraction of drug-like chemical space. The strong predictive power of modern machine-learning methods for virtual chemical screening enables training models on known active and inactive compounds and extrapolating to much larger chemical libraries. However, there has been limited experimental validation of these methods in practical applications on large commercially available or synthesize-on-demand chemical libraries. Through a prospective evaluation with the bacterial protein-protein interaction PriA-SSB, we demonstrate that ligand-based virtual screening can identify many active compounds in large commercial libraries. We use cross-validation to compare different types of supervised learning models and select a random forest (RF) classifier as the best model for this target. When predicting the activity of more than 8 million compounds from Aldrich Market Select, the RF substantially outperforms a naïve baseline based on chemical structure similarity. 48% of the RF's 701 selected compounds are active. The RF model easily scales to score one billion compounds from the synthesize-on-demand Enamine REAL database. We tested 68 chemically diverse top predictions from Enamine REAL and observed 31 hits (46%), including one with an IC