Review
Learning to recognize objects

https://doi.org/10.1016/S1364-6613(98)01261-3Get rights and content

Abstract

Evidence from neurophysiological and psychological studies is coming together to shed light on how we represent and recognize objects. This review describes evidence supporting two major hypotheses: the first is that objects are represented in a mosaic-like form in which objects are encoded by combinations of complex, reusable features, rather than two-dimensional templates, or three-dimensional models. The second hypothesis is that transform-invariant representations of objects are learnt through experience, and that this learning is affected by the temporal sequence in which different views of the objects are seen, as well as by their physical appearance.

Section snippets

Neurophysiology

From lesion studies and cellular recording it has been proposed that the sequence of primate visual areas (V4→PIT→CIT→AIT)—often referred to as the ventral stream—solve the problem of what we are looking at. In contrast, a second stream leading dorsally and into the parietal lobe (V1→V2→V3→intraparietal areas), has been implicated in the role of deciding where that object is located2, 3, 4, 5 (Fig. 2). In particular, cells in the latter part of the ventral stream in the inferior temporal areas

Psychophysical studies

Apart from the accumulating evidence for the experience-dependent modification of neural responses, there are also ample examples in the field of human object recognition. One of the important recent developments has been the use of stimuli chosen from novel object classes. What emerged from this work was that if two views of a novel object were learned, recognition was better for new views oriented between the two training views, than for views lying outside them19, 20 (see Fig. 4). These

Representation through image features

The view-based approach to object recognition accords well with a large portion of the available neurophysiological data on face cells. However, the precise nature of this representation remains as yet unclear. Although there is good evidence that neurons represent faces through some form of distributed representation, there is neurophysiological evidence that this is sometimes at the level of complete views35, 36, 37 and sometimes at the level of facial features38, 39, 40. Representation in

Temporal continuity as a cue to invariance learning

A broadly tuned feature-based system of the type under consideration in this review, would be sufficient to perform recognition over small transformations48. However, associating images over larger shape transformations either requires separate pre-normalization for size and translation of the image, or the use of separate view-specific feature detectors that would then feed into a view-invariant detector. The use of pre-normalization is at odds with the available neurophysiological evidence,

Conclusion

Our intention in this paper has been to draw together much of the research currently underway in the field of object recognition, and to highlight the encouraging parallels between neurophysiological and psychophysical evidence in this field. In the main body of the article we have concentrated on the questions of whether and how representations of objects are learnt, reviewing studies ranging from adaptation to Mooney faces, to the fall in canonical view effects with experience. We have also

Outstanding questions

  • Assumptions (i.e. priors) are known to play an important role in perception60 But what specific role do they play in object recognition? The temporal-association hypothesis is one example, but there might be others. To what extent are these priors learnt, and to what extent innate?

  • This review has drawn much of its evidence from work on neurons responsive to faces. But to what extent are faces and objects related?

  • Why do some cells represent faces holistically35, 36, 37, and others as features38,

References (61)

  • D. Perrett et al.

    Neurophysiology of shape processing

    Image Vis. Comput

    (1993)
  • D. Valentin et al.

    What represents a face? a computational approach for the integration of physiological and psychological data

    Perception

    (1997)
  • S. Carey et al.

    Are faces perceived as configurations more by adults than by children?

    Visual Cognit

    (1994)
  • P. Sinha et al.

    Role of learning in three-dimensional form perception

    Nature

    (1996)
  • M.J. Farah

    Visual Agnosia: Disorders of Object Recognition and What They Can Tell Us About Normal Vision

    (1990)
  • L.G. Ungerleider et al.

    Two cortical visual systems

  • M.A. Goodale et al.

    Separate visual pathways for perception and action

    Trends Neurosci

    (1992)
  • M.P. Young

    Objective analysis of the topological organization of the primate cortical visual system

    Nature

    (1992)
  • E.T. Rolls

    Neurophysiological mechanisms underlying face processing within and beyond the temporal cortical areas

    Philos. Trans. R. Soc. London Ser. B

    (1992)
  • D.I. Perrett

    Organisation and functions of cells responsive to faces in the temporal cortex

    Philos. Trans. R. Soc. London Ser. B

    (1992)
  • R. Desimone

    Face-selective cells in the temporal cortex of monkeys

    J. Cogn. Neurosci

    (1991)
  • C.G. Gross et al.

    Visual properties of neurons in inferotemporal cortex of the macaque

    J. Neurophysiol

    (1972)
  • J.M. Fuster et al.

    Neuronal firing in the inferotemporal cortex of the monkey in a visual memory task

    J. Neurosci

    (1982)
  • G.C. Baylis et al.

    Selectivity between faces in the responses of a population of neurons in the cortex in the superior temporal sulcus of the monkey

    Brain Res

    (1985)
  • E.T. Rolls

    The effect of learning on the face-selective responses of neurons in the cortex in the superior temporal sulcus of the monkey

    Exp. Brain Res

    (1989)
  • E. Kobatake et al.

    Effects of shape-discrimination training on the selectivity of inferotemporal cells in adult monkeys

    J. Neurophysiol

    (1998)
  • N.K. Logothetis et al.

    Viewer-centered object representations in the primate

    Cereb. Cortex

    (1995)
  • H.H. Bülthoff et al.

    Psychophysical support for a two-dimensional view interpolation theory of object recognition

    Proc. Natl. Acad. Sci. U. S. A

    (1992)
  • M.J. Tovee et al.

    Rapid visual learning in neurones of the primate temporal visual cortex

    NeuroReport

    (1996)
  • V.S. Ramachandran

    2D or not 2D - that is the question

  • Cited by (127)

    • Object recognition in fish: accurate discrimination across novel views of an unfamiliar object category (human faces)

      2018, Animal Behaviour
      Citation Excerpt :

      At the same time, in the human behavioural literature, theorists have argued that the underlying representation is different too. They have developed a model of face representation using a norm-based model (built around specific whole-face prototypes) rather than the feature analysers common to biological models of object recognition (Riesenhuber & Poggio, 2000; Wallis, 2013; Wallis & Bülthoff, 1999). However, their model is not universally accepted.

    • A new model to study visual attention in zebrafish

      2014, Progress in Neuro-Psychopharmacology and Biological Psychiatry
      Citation Excerpt :

      The novel object recognition (NOR) test evaluates an animal's attention that is elicited by the presentation of novel stimuli. Interest in NOR is very recent and a large proportion of the literature on cognition has been dedicated to object and visual recognition in humans, pigeons and primates (Spetch et al., 2006; Wallis and Bülthoff, 1999). The principal advantage of the NOR test is the rapid testing sequence and no necessary training other than the initial exposure session.

    View all citing articles on Scopus
    View full text