ADVERTISEMENT RETURN TO ISSUEPREVArticleNEXTPrediction of Reduced Ion Mobility Constants from Structural Information Using Multiple Linear Regression Analysis and Computational Neural NetworksMatthew D. Wessel and Peter C. JursCite this: Anal. Chem. 1994, 66, 15, 2480–2487Publication Date (Print):August 1, 1994Publication History Published online1 May 2002Published inissue 1 August 1994https://pubs.acs.org/doi/10.1021/ac00087a012https://doi.org/10.1021/ac00087a012research-articleACS PublicationsRequest reuse permissionsArticle Views416Altmetric-Citations83LEARN ABOUT THESE METRICSArticle Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated. Share Add toView InAdd Full Text with ReferenceAdd Description ExportRISCitationCitation and abstractCitation and referencesMore Options Share onFacebookTwitterWechatLinked InRedditEmail Other access optionsGet e-Alertsclose Get e-Alerts
ADVERTISEMENT RETURN TO ISSUEPREVArticleNEXTSimulation of carbon-13 nuclear magnetic resonance spectra of methyl-substituted norbornan-2-olsDebra S. Egolf, Elizabeth B. Brockett, and Peter C. JursCite this: Anal. Chem. 1988, 60, 24, 2700–2706Publication Date (Print):December 15, 1988Publication History Published online1 May 2002Published inissue 15 December 1988https://pubs.acs.org/doi/10.1021/ac00175a012https://doi.org/10.1021/ac00175a012research-articleACS PublicationsRequest reuse permissionsArticle Views47Altmetric-Citations5LEARN ABOUT THESE METRICSArticle Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated. Share Add toView InAdd Full Text with ReferenceAdd Description ExportRISCitationCitation and abstractCitation and referencesMore Options Share onFacebookTwitterWechatLinked InRedditEmail Other access optionsGet e-Alertsclose Get e-Alerts
Virtual screening (VS) has become a preferred tool to augment high-throughput screening1 and determine new leads in the drug discovery process. The core of a VS informatics pipeline includes several data mining algorithms that work on huge databases of chemical compounds containing millions of molecular structures and their associated data. Thus, scaling traditional applications such as classification, partitioning, and outlier detection for huge chemical data sets without a significant loss in accuracy is very important. In this paper, we introduce a data mining framework built on top of a recently developed fast approximate nearest-neighbor-finding algorithm2 called locality-sensitive hashing (LSH) that can be used to mine huge chemical spaces in a scalable fashion using very modest computational resources. The core LSH algorithm hashes chemical descriptors so that points close to each other in the descriptor space are also close to each other in the hashed space. Using this data structure, one can perform approximate nearest-neighbor searches very quickly, in sublinear time. We validate the accuracy and performance of our framework on three real data sets of sizes ranging from 4337 to 249 071 molecules. Results indicate that the identification of nearest neighbors using the LSH algorithm is at least 2 orders of magnitude faster than the traditional k-nearest-neighbor method and is over 94% accurate for most query parameters. Furthermore, when viewed as a data-partitioning procedure, the LSH algorithm lends itself to easy parallelization of nearest-neighbor classification or regression. We also apply our framework to detect outlying (diverse) compounds in a given chemical space; this algorithm is extremely rapid in determining whether a compound is located in a sparse region of chemical space or not, and it is quite accurate when compared to results obtained using principal-component-analysis-based heuristics.
Quantitative−structure property relationships methods are used to develop mathematical models to predict critical temperatures and pressures of a diverse set of organic compounds taken from the Design Institute for Physical Property Data (DIPPR) database. Each compound is represented with calculated molecular structure descriptors that encode its topological, electronic, geometrical, and other features. Subsets of descriptors are selected with simulated annealing and genetic algorithms. Models to predict the critical properties are constructed using multiple linear regression analysis and computational neural networks with errors comparable to the experimental errors of the critical property data.
ADVERTISEMENT RETURN TO ISSUEPREVArticleNEXTAutomated Descriptor Selection for Quantitative Structure-Activity Relationships Using Generalized Simulated AnnealingJon M. Sutter, Steve L. Dixon, and Peter C. JursCite this: J. Chem. Inf. Comput. Sci. 1995, 35, 1, 77–84Publication Date (Print):January 1, 1995Publication History Published online1 May 2002Published inissue 1 January 1995https://pubs.acs.org/doi/10.1021/ci00023a011https://doi.org/10.1021/ci00023a011research-articleACS PublicationsRequest reuse permissionsArticle Views372Altmetric-Citations179LEARN ABOUT THESE METRICSArticle Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated. Share Add toView InAdd Full Text with ReferenceAdd Description ExportRISCitationCitation and abstractCitation and referencesMore Options Share onFacebookTwitterWechatLinked InRedditEmail Other access options Get e-Alerts
The pattern recognition technique utilizing adaptive binary pattern classifiers has been applied to the interpretation of infrared spectra. The binary pattern classifiers have been trained to determine the chemical classes of x-y digitized infrared spectra. High predictive abilities have been obtained in classifying unknown spectra. A new training procedure for binary pattern classifiers has been developed, and it has been used to classify ir spectra into chemical classes. Pattern classifiers trained in the conventional way and by the new procedure have been used in conjunction with feature selection, and it is shown that a small fraction of the data is necessary to classify these infrared spectra successfully into chemical classes.
We report several binary classification models that directly link the genetic toxicity of a series of 140 thiophene derivatives with information derived from the compounds' molecular structure. Genetic toxicity was measured using an SOS Chromotest. IMAX (maximal SOS induction factor) values were recorded for each of the 140 compounds both in the presence and in the absence of S9 rat liver homogenate. Compounds were classified as genotoxic if IMAX >or= 1.5 in either test or nongenotoxic if IMAX < 1.5 for both tests. The molecular structures were represented by numerical descriptors that encoded the topological, geometric, electronic, and polar surface area properties of the thiophene derivatives. The classification models used were linear discriminant analysis (LDA), k-nearest neighbor classification (k-NN), and the probabilistic neural network (PNN). These were used in conjunction with either a genetic algorithm or a generalized simulated annealing to find optimal subsets of descriptors for each classifier. The quality of the resulting models was determined by the number of misclassified compounds, with preference given to models that produced fewer false negative classifications. Model sizes ranged from seven descriptors for LDA to three descriptors for k-NN and PNN. Very good classification results were obtained with all three classifiers. Classification rates for the LDA, k-NN, and PNN models were 80, 85, and 85%, respectively, for the prediction set compounds. Additionally, a consensus model was generated that incorporated all three of the basic model types. This consensus model correctly predicted the genotoxicity of 95% of the prediction set compounds.