|
Accurate Solubility Prediction with Error Bars for Electrolytes: A Machine
Learning Approach
AbstractWe present a machine learning approach (Gaussian Process model) that provides a statistical modeling of aqueous solubility based on measured data. The model was validated on the well known set of 1311 compounds by Huuskonen et.al. For 91% of the compounds, our predictions were correct within one order of magnitude, even though the respective compounds were not used in training the model. Existing commercial software achieves 79% correct predictions within one order of magnitude. On an in-house dataset of 632 drug candidates at Schering (mostly electrolytes), 82% of our predictions are correct within one order of magnitude, compared to only 64% achieved by commercial software. Additional validations with new in-house measured data will be presented. On top of the accurate predictions, the proposed machine learning model also provides confidence estimates for each individual prediction.
[Edit] |