During development, humans and animals learn to understand
their visual environment based on their sensory experience.
Despite decades of research, it is still not clear what
representations the brain uses in this process and how it
acquires them. We follow a systematic research program to
clarify these issues. Recently, we have conducted a series
of adult and infant experiments showing that humans possess
a fundamental ability to extract statistical regularities
of unknown visual scenes automatically both in time and
space from a very early age. We argue that this basic ability
is key in the formation of visual representations from the
simplest levels of luminance changes to the level of conscious
memory traces. Currently we are in the process of investigating
the interaction between this learning ability and various
perceptual constraints due to e.g., eye movements, clutter,
occlusion, and the hierarchical embeddedness of features,
that make such learning feasible. Using fMRI, we have also
identified the brain structures involved in this learning
and made predictions about the nature of the process.
Our computational modeling work interprets our experimental
data in a Bayesian framework. We have demonstrated that
generative statistical model selection learning can well
capture human behavior observed in our experiments. This
suggests that humans interpret their sensory input through
an "unconscious inference" process that follows precisely
the statistical structure of the environment but aims at
the simplest possible internal description of the input.
We have shown that this framework gives a statistically
based interpretation of empirical Gestalt rules and chunking
as well as provides a tightly coupled explanation for visual
recognition and visual learning.
The Bayesian framework requires a continuous reciprocal
interaction between groups of elements at different levels
of the hierarchical representation encoded in the brain.
This dynamic collective coding is in contrast with the traditional
feed forward view of how visual information is processed
in the cortex. We have shown that both at the level of primary
visual cortex and at higher areas the representation of
visual information is best described as the activity pattern
of cell assemblies rather than a set of individual feature
detectors. We have also shown that the precise developmental
pattern and the correlational structure of cell responses
in the primary visual cortex calls in question the notion
that ongoing cortical activity is accidental noise unrelated
to visual coding. Instead, we suggest that ongoing activity
is the manifestation of internal states of the brain that
expresses relevant knowledge of the world for perception,
and sensory input only modulates these states. This view
supports Hebb's original notion of internal dynamical states
being crucial for integrating cognitive processes beyond
simple stimulus-response associations, and it can potentially
close the gap between response functions and behavior.