Homogeneous Multi-Instance Learning with Arbitrary Dependence
Sivan Sabato and Naftali Tishby
In: 22nd Conference on Learning Theory (COLT 2009), 18-21 June 2009, Montreal, Quebec.
In the supervised learning setting termed Multiple-Instance Learning (MIL), the examples are bags of instances, and the bag label is a function of the labels of its instances, typically a Boolean OR. The learner observes the bag labels but not the instance labels that generated them. MIL has numerous applications, and many heuristic algorithms have been used successfully on this problem. However, no guarantees on the result or generalization bounds have been shown for these algorithms. At the same time, theoretical analysis has
shown MIL to be either trivial or too hard, depending on the assumptions. In this work we formally define a new setting which is more relevant for MIL applications than previous theoretic assumptions. The sample complexity of this setting is shown to be only logarithmically dependent on the size of the bag, and for the case of Boolean OR, an algorithm with proven guarantees is provided. We further extend the sample complexity results to a real-valued generalization of MIL.