Region-Based Image Classification with a Latent SVM Model
Oksana Yakhnenko, Jakob Verbeek and Cordelia Schmid
Image classification is a challenging problem due to intra-class appearance variation, background clutter, occlusion, and photometric variability. Current state-of-the-art methods do not explicitly handle background clutter, but rely on global image representations, such as bag-of-word (BoW) models. Multiple-instance learning has been used to explicitly deal with clutter, classifying an image positively as soon as at least one image region is classified positively. In this paper, we propose a more robust latent-SVM model that, unlike multiple-instance learning, does not rely on a single image region to trigger a positive image classification. Rather, our model scores an images using all regions, and associates with each region a latent variable that indicates whether the region represents the object of interest or its background. Background and foreground regions are each scored by a different appearance model, and an additional term in the score function ensures that neighboring regions tend to take the same background/foreground label. We learn the parameters of our latent SVM model using an iterative procedure that alternates between inferring the latent variables, and updating the parameters. We compare the performance of our approach on the PASCAL VOC'07 dataset to that of SVMs trained on global BoW representations, and to a multiple-instance SVM trained on BoW representations of image regions. We show that our approach outperforms multiple-instance learning by a large margin on all classes, and outperforms global BoW models in 17 out of the 20 classes.