A structured model of video reproduces primary visual cortical organisation
The visual system must learn to infer the presence of objects and features in the world from the images it encounters, and as such it must, either implicitly or explicitly, model the way these elements interact to create the image. Do the response properties of cells in the mammalian visual system reflect this constraint? To address this question, we constructed a probabilistic model in which the identity and attributes of simple visual elements were represented explicitly and learnt the parameters of this model from unparsed, natural video sequences. After learning, the behaviour and grouping of variables in the probabilistic model corresponded closely to functional and anatomical properties of simple and complex cells in the primary visual cortex (V1). In particular, feature identity variables were activated in a way that resembled the activity of complex cells, while feature attribute variables responded much like simple cells. Furthermore, the grouping of the attributes within the model closely parallelled the reported anatomical grouping of simple cells in cat V1. Thus, this generative model makes explicit an interpretation of complex and simple cells as elements in the segmentation of a visual scene into basic independent features, along with a parametrisation of their moment-by-moment appearances. We speculate that such a segmentation may form the initial stage of a hierarchical system that progressively separates the identity and appearance of more articulated visual elements, culminating in view-invariant object recognition.