Home > M.R. Bauer Foundation > 2000 Summary Report > Michael J. Tarr, Ph.D.

Michael J. Tarr, Ph.D.


Professor of Cognitive Science
Brown University
Providence, Rhode Island
October 4, 1999

Visual Object Recognition: Can a Single Mechanism Suffice?

How do humans recognize three-dimensional objects? This simple question leads to surprisingly complex answers. At the heart of what makes visual recognition a difficult problem are two issues. First, we live in a world made up of three-dimensional objects, yet only receive two-dimensional stimulation on our retinae as sense input. Second, we live in a highly variable world in which images of objects change constantly due to transformations in size, position, orientation, pose, color, lighting, and configuration. The challenge is to derive a consistent mapping from a potentially infinite set of images to a relatively small number of known objects and categories. It is a problem that the human visual system routinely and effortlessly solves.

To begin to understand how visual object recognition works, we must consider three factors that affect human recognition behavior:

  1. the image geometry for objects and object classes and how it changes with changes in three-dimensional orientation, illumination, etc.;
  2. the level of categorization required for a given task, varying from coarse "basic-levelcategorization to fine item-specific recognitionand
  3. the differing degrees of perceptual expertise that observers have with specific object classes and how visual experience fine-tunes the recognition system to attain such expertise.

The research my collaborators and I have pursued is aimed at elucidating how these factors interact to produce recognition competence across a wide range of contexts. When considering image geometry the most salient example of how variability in the world affects image structure is a rotation in depth-even small rotations may produce dramatically different input patterns. Vision researchers have typically assumed that in order to recognize objects across such variability, object representations must be viewpoint independent. To test this hypothesis, we explored how perceivers generalized from familiar views of novel objects to unfamiliar views of the same objects. Surprisingly, recognition performance was not only viewpoint dependent, but, critically, was best predicted by the distance between the unfamiliar viewpoint and the nearest previously seen viewpoint. This finding led to the hypothesis that three-dimensional objects are represented as sets of viewpoint-specific descriptions, "multiple-views," rather than viewpoint-independent models.

In subsequent research we have considered how a viewpoint-dependent system can account for recognition at different categorical levels. My earlier work had established only that viewpoint-dependent mechanisms are implicated in more subordinate-level, within-category recognition tasks. Common wisdom associated such findings with image-specific templates that could not support the many-to-one mapping required for object categorization. To test this claim, we investigated how perceivers recognized novel objects that corresponded to distinct perceptual classes. Recognition performance was found to be viewpoint-dependent despite the fact that individual objects were composed of relatively discriminable features and parts. We also obtained evidence that perceivers were able to generalize from known instances of a perceptual class to unknown instances of that same class, but, again critically, only at familiar viewpoints. Thus, it appears that viewpoint-dependent recognition procedures are not limited to item-specific recognition, but rather include mechanisms for identification at more general categorical levels.

The fact that perceivers can recognize objects at multiple categorical levels has important implications for how we understand apparent dissociations based on object class. In particular, it has been argued that face and non-face object recognition is subserved by separable systems. Three sources of evidence are cited in favor of this stance: behavioral effects that are face-specific; brain imaging studies that show a specific neural substrate for face processing-the "face area"; and brain-injured prosopagnosic patients that appear disproportionately impaired at face recognition relative to non-face object recognition. Studies producing such evidence, however, have typically compared the recognition of faces to the recognition of common objects without controlling for either the level of categorization or the level of expertise across stimulus classes.

To address these confounds we designed a set of novel objects, "Greebles," that formed a hierarchically organized homogeneous class similar to that formed by faces. We then used these objects to test one of the most widely cited pieces of behavioral evidence for face-specific mechanisms — "configurable sensitivity."

In a part recognition task we found that Greeble experts, but not Greeble novices, showed a pattern of configurable sensitivity similar to that observed for faces. We also used brain-imaging studies (fMRI) to investigate the neural substrates mediating visual recognition. In a study using common non-face objects we found that the brain regions associated with face processing were similarly active for within-category recognition judgments. This common pattern indicates that it was the specificity of face recognition, not the class per se that led to enhanced activity in the face area.

In a second study we imaged Greeble novices and Greeble experts and found that the experts, but again not the novices, showed enhanced activity in the face area, even when this region was defined individually for each subject. Finally, we used Greebles and other non-face objects to explore prosopagnosic subjects' sensitivity to different levels of categorization.

Crucially, we used measures that took into account possible biases and differences in the effort spent recognizing different stimulus classes. Across several experiments we found that, independent of object category, prosopagnosic subjects were far more sensitive to the manipulation of the level of categorization as compared to control subjects. Thus, apparent face recognition deficits may be better explained as deficits in recognizing objects at more specific levels of discrimination.

Taken together, our behavioral, imaging, and neuropsychological work serves to implicate both the level of categorization and the level of perceptual expertise as important factors in visual recognition tasks. It is our hypothesis that the interaction of these two factors is sufficient to explain the apparent specialization of face recognition mechanisms. Overall, there appears to be little reason to posit separable recognition systems along the lines of viewpoint dependency, level of categorization, or stimulus class. Rather, humans appear to have a single highly adaptable visual recognition system that can be fine-tuned by experience to support a spectrum of recognition behaviors. Although there is clearly much work to be done, we have begun to illuminate some of the properties of this remarkable system.

 

 

 

Speaker Schedule for Current Year  |  Reports from Previous Years
Top of Page | Life Sciences | Brandeis University