PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Comparing machines and humans on a visual categorization test
Francois Fleuret, Ting Li, Charles Dubout, Emma K. Wampler, Steven Yantis and Donald Geman
Proceedings of the National Academy of Sciences Volume 108, Number 43, pp. 17621-17625, 2011.

Abstract

Automated scene interpretation has benefited from advances in machine learning, and restricted tasks, such as face detection, have been solved with sufficient accuracy for restricted settings. However, the performance of machines in providing rich semantic descriptions of natural scenes from digital images remains highly limited and hugely inferior to that of humans. Here we quantify this "semantic gap" in a particular setting: We compare the efficiency of human and machine learning in assigning an image to one of two categories determined by the spatial arrangement of constituent parts. The images are not real, but the category-defining rules reflect the compositional structure of real images and the type of "reasoning" that appears to be necessary for semantic parsing. Experiments demonstrate that human subjects grasp the separating principles from a handful of examples, whereas the error rates of computer programs fluctuate wildly and remain far behind that of humans even after exposure to thousands of examples. These observations lend support to current trends in computer vision such as integrating machine learning with parts-based modeling.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Machine Vision
Theory & Algorithms
ID Code:9364
Deposited By:Francois Fleuret
Deposited On:16 March 2012