Benchmark data set for in silico prediction of Ames mutagenicity.

Katja Hansen,Sebastian Mika,Timon Schroeter,Andreas Sutter,Antonius Ter Laak,Thomas Steger-Hartmann,Nikolaus Heinrich,Klaus-Robert Müller

Benchmark data set for in silico prediction of Ames mutagenicity.

2009

Katja Hansen
Sebastian Mika
Timon Schroeter
Andreas Sutter
Antonius Ter Laak
Thomas Steger-Hartmann
Nikolaus Heinrich
Klaus-Robert Müller

Up to now, publicly available data sets to build and evaluate Ames mutagenicity prediction tools have been very limited in terms of size and chemical space covered. In this report we describe a new unique public Ames mutagenicity data set comprising about 6500 nonconfidential compounds (available as SMILES strings and SDF) together with their biological activity. Three commercial tools (DEREK, MultiCASE, and an off-the-shelf Bayesian machine leamer in Pipeline Pilot) are compared with four noncommercial machine learning implementations (Support Vector Machines, Random Forests, k-Nearest Neighbors, and Gaussian Processes) on the new benchmark data set.

Keywords:

Gaussian process
Chemical space
Random forest
Support vector machine
Machine learning
Data set
Data mining
In silico
Artificial intelligence
Bayesian probability
Chemistry
benchmark data

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

210

Citations