PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Accurate Solubility Prediction with Error Bars for Electrolytes: A Machine Learning Approach
Timon Schroeter, Anton Schwaighofer, Sebastian Mika, Antonius ter Laak, Detlev Suelzle and Nikolaus Heinrich
In: American Chemical Society 232nd National Meeting & Exposition, September 10 - 14, 2006, San Francisco.


We present a machine learning approach (Gaussian Process model) that provides a statistical modeling of aqueous solubility based on measured data. The model was validated on the well known set of 1311 compounds by Huuskonen For 91% of the compounds, our predictions were correct within one order of magnitude, even though the respective compounds were not used in training the model. Existing commercial software achieves 79% correct predictions within one order of magnitude. On an in-house dataset of 632 drug candidates at Schering (mostly electrolytes), 82% of our predictions are correct within one order of magnitude, compared to only 64% achieved by commercial software. Additional validations with new in-house measured data will be presented. On top of the accurate predictions, the proposed machine learning model also provides confidence estimates for each individual prediction.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Oral)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:2505
Deposited By:Anton Schwaighofer
Deposited On:22 November 2006