How to generate the best samples for active learning in classification
Benoît Gandar, Gaëlle Loosli and Guillaume Deffuant
In: Active Learning and Experimental Design, 15/05/2010, Sardinia, Italy.
We consider a problem of active learning classification: we suppose we can determine, with an oracle, the label of any point in a given compact set, and we want to generate a sample of a given size which will allow us to get the best approximation of the oracle function. This problem can arise in various contexts such as optimization of a meta-model in engineering, function approximation or kernel approximation in theory of viability for example. It’s well known that the more numerous the data are, the best quality the modeling is. However obtaining data can be expensive or destructive in consequence the experimenter wants to get the best value from this investment. He has to choose the best learning set without any prior knowledge. The first contribution of this paper is to state that dispersion is the most relevant criterion for generating samples in active classification learning whereas discrepancy is the relevant criterion for active regression learning. However low dispersion samples are not easy to generate. The second contribution consists then in making a study of different ways to proceed and in proposing a new algorithm.
|EPrint Type:||Conference or Workshop Item (Poster)|
|Project Keyword:||Project Keyword UNSPECIFIED|
|Subjects:||Theory & Algorithms|
|Deposited By:||Gaëlle Loosli|
|Deposited On:||08 March 2011|