Spatial extensions to bag of visual words
The Bag of Visual Words (BoV) paradigm has successfully been applied to image content analysis tasks such as image classification and object detection. The basic BoV approach overlooks spatial descriptor distribution within images. Here we describe spatial extensions to BoV and experimentally compare them in the VOC2007 benchmark image category detection task. In particular, we compare two ways for tiling images geometrically: soft tiling approach—proposed here— and the traditional hard tiling technique. The experiments also address two methods of fusing information from several tilings of the images: post-classifier fusion and fusion on the level of a SVM kernel. The experiments confirm that the performance of a BoV system can be greatly enhanced by taking the descriptors’ spatial distribution into account. The soft tiling technique performs well even with a single tiling mask, whereas multi- mask fusion is necessary for good category detection perfor- mance in case of hard tiling. The evaluated fusion mecha- nisms performed approximately equally well.