Automatic Choice of Control Measurements
Gayle Leen, David Hardoon and Samuel Kaski
In: Asian Conference on Machine Learning(2009).
In experimental design, a standard approach for distinguish- ing experimentally induced effects from unwanted effects is to design control measurements that differ only in terms of the former. However, in some cases, it may be problematic to design and measure controls specifically for an experiment. In this paper, we investigate the possibil- ity of learning to choose suitable controls from a database of potential controls, which differ in their degree of relevance to the experiment. This approach is especially relevant in the field of bioinformatics where ex- perimental studies are predominantly small-scale, while vast amounts of biological measurements are becoming increasingly available. We focus on finding controls for differential gene expression studies (case vs con- trol) of various cancers. In this situation, the ideal control would be a healthy sample from the same tissue (the same mixture of cells as the tumor tissue), under the same conditions except for cancer-specific ef- fects, which is almost impossible to obtain in practice. We formulate the problem of learning to choose the control in a Gaussian process classi- fication framework, as a novel paired multitask learning problem. The similarities between the underlying set of classifiers are learned from the set of control tissue gene expression profiles.