Learning models for object recognition from natural language descriptions
Josiah Wang, Katja Markert and Mark Everingham
In: BMVC 2009, 7-10 Sep 2009, London, UK.
We investigate the task of learning models for visual object recognition from natural language descriptions alone. The approach contributes to the recognition of fine-grain object categories, such as animal and plant species, where it may be difficult to collect many images for training, but where textual descriptions of visual attributes are readily available. As an example we tackle recognition of butterfly species, learning models from descriptions in an online nature guide. We propose natural language processing methods for extracting salient visual attributes from these descriptions to use as ‘templates’ for the object categories, and apply vision methods to extract corresponding attributes from test images. A generative model is used to connect textual terms in the learnt templates to visual attributes. We report experiments comparing the performance of humans and the proposed method on a dataset of ten butterfly categories.
|EPrint Type:||Conference or Workshop Item (Oral)|
|Project Keyword:||Project Keyword UNSPECIFIED|
|Deposited By:||Mark Everingham|
|Deposited On:||08 March 2010|