Unsupervised Discovery of Visual Object Class Hierarchies
Josef Sivic, B. Russell, Andrew Zisserman, W. T. Freeman and A. A. Efros
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(2008).
Objects in the world can be arranged into a hierarchy based on their semantic meaning (e.g. organism - animal - feline - cat). What about defining a hierarchy based on the visual appearance of objects? This paper investigates ways to automatically discover a hierarchical structure for the visual world from a collection of unlabeled images. Previous approaches for unsupervised object and scene discovery focused on partitioning the visual data into a set of non-overlapping classes of equal granularity. In this work, we propose to group visual objects using a multi-layer hierarchy tree that is based on common visual elements. This is achieved by adapting to the visual domain the generative Hierarchical Latent Dirichlet Allocation (hLDA) model previously used for unsupervised discovery of topic hierarchies in text. Images are modeled using quantized local image regions as analogues to words in text. Employing the multiple segmentation framework of Russell et al., CVPR'06, we show that meaningful object hierarchies, together with object segmentations, can be automatically learned from unlabeled and unsegmented image collections without supervision. We demonstrate improved object classification and localization performance using hLDA over the previous non-hierarchical method on the MSRC dataset of Winn et al., ICCV'05.