Elsevier

Neurocomputing

Volume 173, Part 3, 15 January 2016, Pages 1811-1823
Neurocomputing

A systematic analysis of a V1–MT neural model for motion estimation

https://doi.org/10.1016/j.neucom.2015.08.091Get rights and content

Highlights

  • A feed-forward V1–MT neural model for optic flow computation.

  • Systematic analysis of V1 temporal profiles.

  • Strategies to encode and decode velocity at the MT level.

  • Relationship between intersection of constraints and linear combination decoding.

  • Estimation of real-world optic flow through a biologically plausible model.

Abstract

A neural feed-forward model composed of two layers that mimic the V1–MT primary motion pathway, derived from previous works by Heeger and Simoncelli, is proposed and analyzed. Essential aspects of the model are highlighted and comparatively analyzed to point out how realistic neural responses can be efficiently and effectively used for optic flow estimation if properly combined at a population level. First, different profiles of the spatio-temporal V1 receptive fields are compared, both in terms of their properties in the frequency domain, and in terms of their responses to random dots and plaid stimuli. Then, a pooling stage at the MT level, which combines the afferent V1 responses, is modeled to obtain a population of pattern cells that encodes the local velocities of the visual stimuli. Finally, a decoding stage allows us to combine MT activities in order to compute optic flow. A systematic validation of the model is performed by computing the optic flow on synthetic and standard benchmark sequences with ground truth flow available. The average angular errors and the end-point errors on the resulting estimates allow us to quantitatively compare the different spatio-temporal profiles and the choices of the model׳s parameters, and to assess the validity and effectiveness of the approach in realistic situations.

Introduction

An accurate computation of motion is an inescapable requirement for any agent, natural or artificial, acting in a real-world dynamic environment. For this reason, in the literature of Computer Vision and Robotics, many algorithms for the computation of the optic flow have been proposed, which are characterized by high performances either in terms of accuracy of the estimates or in terms of the execution time (see [1] for a review). Complex visual motion analysis is effectively solved by the mammals׳ vision system, thus the neural mechanisms underlying motion analysis in the visual cortex have been extensively studied [2], [3], [4], [5].

In this paper, we present an engineered neural model, derived from the ones proposed in [6], [7], [4], that takes into account the hierarchical processing stages that occur in the primary visual cortex (V1) and in the medio-temporal area (MT). The distributed representation of the visual motion encoded by a population of MT cells is decoded to obtain an explicit motion estimation, i.e. the optic flow, in order to assess the model performances through realistic motion sequences. The neural model is described through a two-layer architecture composed of distributed populations of cells. In the first layer a bank of spatio-temporal oriented filters approximates the receptive fields (RFs) of the simple cells׳population of area V1, which are tuned to different motion directions and contribute to build the population of complex cells as motion energy units [8]. The responses of the complex cells are combined in the second layer to obtain estimates of the magnitude and direction of local velocities, as it happens in area MT [4].

As we have previously stated, the proposed neural model is based on the ones presented in [6], [7], [4] (in the following we refer to them as the Heeger and Simoncelli (HS) models), since it is a popular model of primate velocity encoding and it is consistent with experimental evidence, e.g. speed tuning [2] and responses to plaids [9], but see [10]. Here, we have not considered the temporal dynamics of MT responses [3], [11] and recent considerations about MT modeling, e.g. [12], [13], since our main aim is to show that the proposed neural model can be effectively and efficiently used in realistic situations, where it performs better than the state-of-the-art bio-inspired algorithms for optic flow computation. In general, the proposed model can be considered as a functional model that does not aim at a full neurophysiological accuracy. In particular, the model proposed in [7], [4] is used for statistical modeling real neurons recorded from an animal model, whereas we chose parameters in a uniform way to estimate the optic flow, thus loosing the biological diversity shown in [4]. Nevertheless, we have taken care of and analyzed the neural modeling aspects while coping with the constraints imposed by the realistic sequences, and by random dots and plaid stimuli. Accordingly, all the analysis is performed by using the same filters and the same model׳s parameters in order to show how the performance of the model changes in relations to the tuning curves, in terms of the optic flow estimation.

Here, we have extended the HS model through the following contributions: (i) we have analyzed several profiles of the temporal filters (see Section 4), it can be considered as a way to modify the shape of the band-pass filters to take into account some recent opinions [14], [13], and this notably affects the performance on velocity estimation, (ii) we have proposed several strategies to encode and decode the velocity at the MT level, which are not present in [4] (see Section 3). In particular, we have discussed the relationship between a multiple speed direction MT model and a computational solution that allows an economy of the cell number used for the distributed representation of visual signal. Moreover, (iii) we have explicitly taken care of the modeling of the V1 and MT tuning curves (see Section 5.2). Regarding the optic flow computation of realistic benchmark sequences (see Section 5.3), (iv) we have explicitly considered the problem of spatial scale [13] and we have adopted a computationally efficient mechanism to cope with such a problem (see Section 3.4).

Section snippets

Related works

Several models have been devised in order to account for the neural mechanisms of motion analysis. A model for the extraction of the image flow, inspired by the stages of the visual motion pathway of the primate brain, is proposed in [6]. Such a model is based on spatio-temporal RFs of simple cells, modeled by Gabor functions [15], and on the computation of the motion energy [8] by a layer of complex cells. The problem of detecting local image velocity is also addressed in [16], by a two-layer

Visual motion encoding and decoding for direct optic flow estimation

In order to obtain a population of pattern cells that encodes all the possible velocity directions of the visual stimuli [21], we have to model the MT linear weights wd(θ). In this paper, by considering the profiles of the MT linear weights shown in [4], we model wd(θ) as a cosine function of the orientation θ. The choice of a cosine function allows us to obtain direction tuning curves of pattern cells that behave as in [4] (see Fig. 5). A population of such MT cells is able to represent

Modeling of the spatio-temporal RFs

The selectivity of the simple cells to the stimuli velocity can be obtained by considering two different approaches: (i) to rotate the xyt coordinates of the Gabor filters and (ii) to tilt the spatio-temporal filter, accordingly to the rule xθvct, where vc denotes the preferred component velocity of the considered cell orthogonal to the direction θ. It is worth noting that we consider only separable filters [8], [6], since they also are the most effective choice from the implementation point

Implementation details

Table 1 reports the parameters that are used in the simulations to systematically assess the model׳s performance.

The choice of the size of the spatial support of the RF is related to the necessity of processing fine details, i.e. at high image resolution, in realistic sequences. The temporal support of the filter is due to two different needs: to obtain good results on the benchmark sequences and to meet the experimental evidence. In particular, the V1 and MT RFs process the visual signal

Conclusions

In this paper, we have presented and analyzed in a systematic way a neural architecture that models the hierarchical processing stages of V1 and MT cortical areas and that is able to estimate visual motion in realistic scenarios.

First, several spatio-temporal profiles for the V1 receptive fields have been analyzed and compared, both in terms of cells׳ responses to simple input stimuli (random dots and plaids) and in terms of their capability in computing velocity in realistic situations. The

Manuela Chessa is a Post doctoral Research scientist at the Department of Informatics, Bioengineering, Robotics and System Engineering of the University of Genoa. She was born in Genoa, Italy, on October 2nd, 1980. She received her MSc in Bioengineering with full marks from the University of Genoa, Italy, in 2005, and the Ph.D. in Bioengineering from University of Genoa in 2009. Her research interests are focused on the study of biological and artificial vision systems, on the development of

References (41)

  • U. Ilg et al.

    Dynamics of Visual Motion Processing: Neuronal, Behavioral, and Computational Approaches

    (2010)
  • D. Heeger

    Model for the extraction of image flow

    J. Opt. Soc. Am.

    (1987)
  • E. Adelson et al.

    Spatiotemporal energy models for the perception of motion

    J. Opt. Soc. Am.

    (1985)
  • G.R. Stoner et al.

    Neural correlates of perceptual motion coherence

    Nature

    (1992)
  • N.J. Priebe et al.

    The neural representation of speed in macaque area MT/V5

    J. Neurosci.

    (2003)
  • M. Smith et al.

    Dynamics of motion signaling by neurons in macaque area MT

    Nat. Neurosci.

    (2005)
  • C. Beck et al.

    Combining feature selection and integration—a neural model for MT motion selectivity

    PLoS ONE

    (2011)
  • J.A. Perrone

    A neural-based code for computing image velocity from small sets of middle temporal (MT/V5) neuron inputs

    J. Vis.

    (2012)
  • D.C. Bradley et al.

    Velocity computation in the primate visual system

    Nat. Rev. Neurosci.

    (2008)
  • J. Daugman

    Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters

    J. Opt. Soc. Am. A

    (1985)
  • Cited by (12)

    • STC-Flow: Spatio-temporal context-aware optical flow estimation

      2021, Signal Processing: Image Communication
      Citation Excerpt :

      Bio-inspired motion estimation. Bio-inspired motion estimation [32–36] allows to explain and understand human behavior on optical flow, which estimates motion along the visual dorsal pathway in primates. Pauwels et al. [32] emulate large parts of the dorsal stream in an abstract way and implement an architecture with optical flow feature extraction stages, which are used to reliably extract moving objects in real time.

    • Neural implementation of categorization in a motion discrimination task

      2016, Neurocomputing
      Citation Excerpt :

      Since the coherent motion direction in the first display changes trial by trial, the neural system needs to flexibly categorize the motion direction in the second display, i.e., the report may be different given one fixed test direction but different references. Electrophysiological studies have revealed that: (1) neurons in the middle temporal cortex (MT) encode the motion direction [28–31]; (2) neurons in the supplementary eye field (SEF) memorize the direction during the delay period [32,33]; and (3) neurons in lateral intraparietal area (LIP) give rise to the categorical response [34–38]. These experimental observations provide information about the encoding of motion direction and the final report stage, but we still do not know how the neural system compare the stimulus with a variable memorized reference to form the final categorization.

    • Research of a Neuron Model with Signal Accumulation for Motion Detection

      2021, Proceedings - 3rd International Conference "Neurotechnologies and Neurointerfaces", CNN 2021
    • STDP Training of Hierarchical Spike Timing Model of Visual Information Processing

      2020, Proceedings of the International Joint Conference on Neural Networks
    View all citing articles on Scopus

    Manuela Chessa is a Post doctoral Research scientist at the Department of Informatics, Bioengineering, Robotics and System Engineering of the University of Genoa. She was born in Genoa, Italy, on October 2nd, 1980. She received her MSc in Bioengineering with full marks from the University of Genoa, Italy, in 2005, and the Ph.D. in Bioengineering from University of Genoa in 2009. Her research interests are focused on the study of biological and artificial vision systems, on the development of bioinspired models, and of natural human–machine interfaces based on virtual, augmented and mixed reality. She has been involved in several national and international research projects. She is an author and a co-author of 38 peer reviewed scientific papers, both on ISI journal and on International Conferences, of 6 book chapters, and of 2 edited books. Moreover, she has been a tutor or a supervisor of several B.Sc. and M.Sc. theses, and she teaches “Software Technologies for Human–Computer Interaction” for the M.Sc. degree in Bioengineering at University of Genoa.

    Silvio P. Sabatini received the Laurea Degree in Electronics Engineering and the Ph.D. in Computer Science from the University of Genoa in 1992 and 1996. He is currently an Associate Professor of Bioengineering at the Department of Informatics, Bioengineering, Robotics and System Engineering of the University of Genoa, Coordinator of the B.Sc. and M.Sc. programs in Biomedical engineering and Bioengineering and Member of the Board of the Doctoral Course in Bioengineering and Robotics. In 1995, he promoted the creation of the “Physical Structure of Perception and Computation” (www.pspc.unige.it) lab to develop models that capture the “physicalist” nature of the information processing occurring in the visual cortex, to understand the signal processing strategies adopted by the brain, and to build novel algorithms and architectures for artificial perception machines. His research interests relate to visual coding and multidimensional signal representation, neuromorphic computing, early-cognitive models for visually-guided behavior, and robot vision. He recently coordinated the EU FP7 project EYESHOTS on the structural mechanisms of visuospatial cognition, responsible for interacting in the 3D peripersonal space, and he was active in promoting and participating, as a partner, to several other EU FP5, FP6 and FP7 projects: ECOVISION, DRIVSCO, MCCOOP, SEARISE. He is the author of more than 100 papers in peer-reviewed journals, book chapters and international conference proceedings.

    Fabio Solari received the degree in Electronic Engineering from the University of Genoa, Italy, in 1995. In 1999 he obtained his Ph.D. in Electronic Engineering and Computer Science from the same University. Since 2005, he has been appointed as an Assistant Professor of Computer Science at the Faculty of Engineering of the University of Genoa. His research activity concerns the study of visual perception with the aim to develop computational models of neural processing, to devise novel bio-inspired computer vision algorithms, and to design virtual and mixed reality environments for ecological visual stimulations. He has participated to 5 European projects: FP7-ICT, EYESHOTS and SEARISE; FP6-IST-FET, DRIVSCO; FP6-NEST, MCCOOP; FP5-IST-FET, ECOVISION. He teaches “Software Technologies for Human–Computer Interaction” for the M.Sc. degree in Bioengineering, and “Computer Vision” for the European Master on Advanced Robotics at University of Genoa.

    View full text