Generic Visual Categorization Using Weak Geometry
In the first part of this chapter we make a general presentation of the bag-of-keypatches approach to generic visual categorization (GVC). Our approach is inspired by the bag-of-words approach to text categorization. This method is able to identify the object content of natural images while generalizing across variations inherent to the object class. To obtain a visual vocabulary insensitive to viewpoint and illumination, rotation or affine invariant orientation histogram descriptors of image patches are vector quantized. Each image is then represented by one visual word occurrence histogram. To classify the images we use one-against-all SVM classifiers and choose the best ranked category. The main advantages of the method are that it is simple, computationally efficient and intrinsically invariant. We obtained excellent results as well for multi-class categorization as for object detection. In the second part we improve the categorizer by incorporating geometric information. Based on scale, orientation or closeness of the keypatches we can consider a large number of simple geometrical relationships, each of which can be considered as a simplistic classifier. We select from this multitude of classifiers (several millions in our case) and combine them effectively with the original classifier. Results are shown on a new challenging 10 class dataset.