On Learning Multicategory Classification with Sample Queries
Joel Ratsaby
Information and Computation Volume 185, Number 2, pp. 298-327, 2003.

## Abstract

Consider the pattern recognition problem of learning multi-category classification from a labeled sample, for instance, the problem of learning handwritten recognition where a category corresponds to an alphanumeric letter. The classical theory of pattern recognition assumes labeled examples appear according to the unknown underlying pattern-class conditional probability distributions where the pattern classes are picked randomly according to their {\em a priori} probabilities. In this paper we pose the following question: Can the learning accuracy be improved if labeled examples are independently randomly drawn according to the underlying class conditional probability distributions however where the pattern classes are chosen {\em not} necessarily according to their {\em a priori} probabilities ? We answer this in the affirmative by showing that there exists a tuning of the subsample proportions which minimizes a loss criterion. The tuning is relative to the intrinsic complexity of the Bayes-classifier. As this complexity depends on the underlying probability distributions which are assumed to be unknown, we provide an algorithm which learns the proportions in an on-line manner utilizing sample querying which asymptotically minimizes the criterion. In practice, this algorithm may be used to boost the performance of existing learning classification algorithms by apportioning better subsample proportions.