How do humans recognize three-dimensional objects? This
simple question leads to surprisingly complex answers.
At the heart of what makes visual recognition a difficult
problem are two issues. First, we live in a world made
up of three-dimensional objects, yet only receive two-dimensional
stimulation on our retinae as sense input. Second, we
live in a highly variable world in which images of objects
change constantly due to transformations in size, position,
orientation, pose, color, lighting, and configuration.
The challenge is to derive a consistent mapping from a
potentially infinite set of images to a relatively small
number of known objects and categories. It is a problem
that the human visual system routinely and effortlessly
solves.
To begin to understand how visual object recognition
works, we must consider three factors that affect human
recognition behavior:
- the image geometry for objects and object classes
and how it changes with changes in three-dimensional
orientation, illumination, etc.;
- the level of categorization required for a given task,
varying from coarse "basic-levelcategorization to fine
item-specific recognitionand
- the differing degrees of perceptual expertise that
observers have with specific object classes and how
visual experience fine-tunes the recognition system
to attain such expertise.
The research my collaborators and I have pursued is aimed
at elucidating how these factors interact to produce recognition
competence across a wide range of contexts. When considering
image geometry the most salient example of how variability
in the world affects image structure is a rotation in
depth-even small rotations may produce dramatically different
input patterns. Vision researchers have typically assumed
that in order to recognize objects across such variability,
object representations must be viewpoint independent.
To test this hypothesis, we explored how perceivers generalized
from familiar views of novel objects to unfamiliar views
of the same objects. Surprisingly, recognition performance
was not only viewpoint dependent, but, critically, was
best predicted by the distance between the unfamiliar
viewpoint and the nearest previously seen viewpoint. This
finding led to the hypothesis that three-dimensional objects
are represented as sets of viewpoint-specific descriptions,
"multiple-views," rather than viewpoint-independent models.
In subsequent research we have considered how a viewpoint-dependent
system can account for recognition at different categorical
levels. My earlier work had established only that viewpoint-dependent
mechanisms are implicated in more subordinate-level, within-category
recognition tasks. Common wisdom associated such findings
with image-specific templates that could not support the
many-to-one mapping required for object categorization.
To test this claim, we investigated how perceivers recognized
novel objects that corresponded to distinct perceptual
classes. Recognition performance was found to be viewpoint-dependent
despite the fact that individual objects were composed
of relatively discriminable features and parts. We also
obtained evidence that perceivers were able to generalize
from known instances of a perceptual class to unknown
instances of that same class, but, again critically, only
at familiar viewpoints. Thus, it appears that viewpoint-dependent
recognition procedures are not limited to item-specific
recognition, but rather include mechanisms for identification
at more general categorical levels.
The fact that perceivers can recognize objects at multiple
categorical levels has important implications for how
we understand apparent dissociations based on object class.
In particular, it has been argued that face and non-face
object recognition is subserved by separable systems.
Three sources of evidence are cited in favor of this stance:
behavioral effects that are face-specific; brain imaging
studies that show a specific neural substrate for face
processing-the "face area"; and brain-injured prosopagnosic
patients that appear disproportionately impaired at face
recognition relative to non-face object recognition. Studies
producing such evidence, however, have typically compared
the recognition of faces to the recognition of common
objects without controlling for either the level of categorization
or the level of expertise across stimulus classes.
To address these confounds we designed a set of novel
objects, "Greebles," that formed a hierarchically organized
homogeneous class similar to that formed by faces. We
then used these objects to test one of the most widely
cited pieces of behavioral evidence for face-specific
mechanisms "configurable sensitivity."
In a part recognition task we found that Greeble experts,
but not Greeble novices, showed a pattern of configurable
sensitivity similar to that observed for faces. We also
used brain-imaging studies (fMRI) to investigate the neural
substrates mediating visual recognition. In a study using
common non-face objects we found that the brain regions
associated with face processing were similarly active
for within-category recognition judgments. This common
pattern indicates that it was the specificity of face
recognition, not the class per se that led to enhanced
activity in the face area.
In a second study we imaged Greeble novices and Greeble
experts and found that the experts, but again not the
novices, showed enhanced activity in the face area, even
when this region was defined individually for each subject.
Finally, we used Greebles and other non-face objects to
explore prosopagnosic subjects' sensitivity to different
levels of categorization.
Crucially, we used measures that took into account possible
biases and differences in the effort spent recognizing
different stimulus classes. Across several experiments
we found that, independent of object category, prosopagnosic
subjects were far more sensitive to the manipulation of
the level of categorization as compared to control subjects.
Thus, apparent face recognition deficits may be better
explained as deficits in recognizing objects at more specific
levels of discrimination.
Taken together, our behavioral, imaging, and neuropsychological
work serves to implicate both the level of categorization
and the level of perceptual expertise as important factors
in visual recognition tasks. It is our hypothesis that
the interaction of these two factors is sufficient to
explain the apparent specialization of face recognition
mechanisms. Overall, there appears to be little reason
to posit separable recognition systems along the lines
of viewpoint dependency, level of categorization, or stimulus
class. Rather, humans appear to have a single highly adaptable
visual recognition system that can be fine-tuned by experience
to support a spectrum of recognition behaviors. Although
there is clearly much work to be done, we have begun to
illuminate some of the properties of this remarkable system.