A systematic analysis of a V1–MT neural model for motion estimation
Introduction
An accurate computation of motion is an inescapable requirement for any agent, natural or artificial, acting in a real-world dynamic environment. For this reason, in the literature of Computer Vision and Robotics, many algorithms for the computation of the optic flow have been proposed, which are characterized by high performances either in terms of accuracy of the estimates or in terms of the execution time (see [1] for a review). Complex visual motion analysis is effectively solved by the mammals׳ vision system, thus the neural mechanisms underlying motion analysis in the visual cortex have been extensively studied [2], [3], [4], [5].
In this paper, we present an engineered neural model, derived from the ones proposed in [6], [7], [4], that takes into account the hierarchical processing stages that occur in the primary visual cortex (V1) and in the medio-temporal area (MT). The distributed representation of the visual motion encoded by a population of MT cells is decoded to obtain an explicit motion estimation, i.e. the optic flow, in order to assess the model performances through realistic motion sequences. The neural model is described through a two-layer architecture composed of distributed populations of cells. In the first layer a bank of spatio-temporal oriented filters approximates the receptive fields (RFs) of the simple cells׳population of area V1, which are tuned to different motion directions and contribute to build the population of complex cells as motion energy units [8]. The responses of the complex cells are combined in the second layer to obtain estimates of the magnitude and direction of local velocities, as it happens in area MT [4].
As we have previously stated, the proposed neural model is based on the ones presented in [6], [7], [4] (in the following we refer to them as the Heeger and Simoncelli (HS) models), since it is a popular model of primate velocity encoding and it is consistent with experimental evidence, e.g. speed tuning [2] and responses to plaids [9], but see [10]. Here, we have not considered the temporal dynamics of MT responses [3], [11] and recent considerations about MT modeling, e.g. [12], [13], since our main aim is to show that the proposed neural model can be effectively and efficiently used in realistic situations, where it performs better than the state-of-the-art bio-inspired algorithms for optic flow computation. In general, the proposed model can be considered as a functional model that does not aim at a full neurophysiological accuracy. In particular, the model proposed in [7], [4] is used for statistical modeling real neurons recorded from an animal model, whereas we chose parameters in a uniform way to estimate the optic flow, thus loosing the biological diversity shown in [4]. Nevertheless, we have taken care of and analyzed the neural modeling aspects while coping with the constraints imposed by the realistic sequences, and by random dots and plaid stimuli. Accordingly, all the analysis is performed by using the same filters and the same model׳s parameters in order to show how the performance of the model changes in relations to the tuning curves, in terms of the optic flow estimation.
Here, we have extended the HS model through the following contributions: (i) we have analyzed several profiles of the temporal filters (see Section 4), it can be considered as a way to modify the shape of the band-pass filters to take into account some recent opinions [14], [13], and this notably affects the performance on velocity estimation, (ii) we have proposed several strategies to encode and decode the velocity at the MT level, which are not present in [4] (see Section 3). In particular, we have discussed the relationship between a multiple speed direction MT model and a computational solution that allows an economy of the cell number used for the distributed representation of visual signal. Moreover, (iii) we have explicitly taken care of the modeling of the V1 and MT tuning curves (see Section 5.2). Regarding the optic flow computation of realistic benchmark sequences (see Section 5.3), (iv) we have explicitly considered the problem of spatial scale [13] and we have adopted a computationally efficient mechanism to cope with such a problem (see Section 3.4).
Section snippets
Related works
Several models have been devised in order to account for the neural mechanisms of motion analysis. A model for the extraction of the image flow, inspired by the stages of the visual motion pathway of the primate brain, is proposed in [6]. Such a model is based on spatio-temporal RFs of simple cells, modeled by Gabor functions [15], and on the computation of the motion energy [8] by a layer of complex cells. The problem of detecting local image velocity is also addressed in [16], by a two-layer
Visual motion encoding and decoding for direct optic flow estimation
In order to obtain a population of pattern cells that encodes all the possible velocity directions of the visual stimuli [21], we have to model the MT linear weights . In this paper, by considering the profiles of the MT linear weights shown in [4], we model as a cosine function of the orientation θ. The choice of a cosine function allows us to obtain direction tuning curves of pattern cells that behave as in [4] (see Fig. 5). A population of such MT cells is able to represent
Modeling of the spatio-temporal RFs
The selectivity of the simple cells to the stimuli velocity can be obtained by considering two different approaches: (i) to rotate the x–y–t coordinates of the Gabor filters and (ii) to tilt the spatio-temporal filter, accordingly to the rule , where vc denotes the preferred component velocity of the considered cell orthogonal to the direction θ. It is worth noting that we consider only separable filters [8], [6], since they also are the most effective choice from the implementation point
Implementation details
Table 1 reports the parameters that are used in the simulations to systematically assess the model׳s performance.
The choice of the size of the spatial support of the RF is related to the necessity of processing fine details, i.e. at high image resolution, in realistic sequences. The temporal support of the filter is due to two different needs: to obtain good results on the benchmark sequences and to meet the experimental evidence. In particular, the V1 and MT RFs process the visual signal
Conclusions
In this paper, we have presented and analyzed in a systematic way a neural architecture that models the hierarchical processing stages of V1 and MT cortical areas and that is able to estimate visual motion in realistic scenarios.
First, several spatio-temporal profiles for the V1 receptive fields have been analyzed and compared, both in terms of cells׳ responses to simple input stimuli (random dots and plaids) and in terms of their capability in computing velocity in realistic situations. The
Manuela Chessa is a Post doctoral Research scientist at the Department of Informatics, Bioengineering, Robotics and System Engineering of the University of Genoa. She was born in Genoa, Italy, on October 2nd, 1980. She received her MSc in Bioengineering with full marks from the University of Genoa, Italy, in 2005, and the Ph.D. in Bioengineering from University of Genoa in 2009. Her research interests are focused on the study of biological and artificial vision systems, on the development of
References (41)
- et al.
A model of neuronal responses in visual area MT
Vis. Res.
(1998) - et al.
A compact harmonic code for early vision based on anisotropic frequency channels
Comput. Vis. Image Underst.
(2010) - et al.
Deconstructing the receptive fieldinformation coding in macaque area MST
Neurocomputing
(2001) - et al.
Receptive-field dynamics in the central visual pathways
Trends Neurosci.
(1995) - et al.
Designing Gabor filters for optimal texture separability
Pattern Recognit.
(2000) - et al.
Pipelined architecture for real-time cost-optimized extraction of visual primitives based on FPGAs
Dig. Signal Process.
(2013) - et al.
A database and evaluation methodology for optical flow
Int. J. Comput. Vis.
(2011) - et al.
Speed skillsmeasuring the visual speed analyzing properties of primate MT neurons
Nat. Neurosci.
(2001) - et al.
Temporal dynamics of a neural solution to the aperture problem in visual area MT of macaque brain
Nature
(2001) - et al.
How MT cells analyze the motion of visual patterns
Nat. Neurosci.
(2006)
Dynamics of Visual Motion Processing: Neuronal, Behavioral, and Computational Approaches
Model for the extraction of image flow
J. Opt. Soc. Am.
Spatiotemporal energy models for the perception of motion
J. Opt. Soc. Am.
Neural correlates of perceptual motion coherence
Nature
The neural representation of speed in macaque area MT/V5
J. Neurosci.
Dynamics of motion signaling by neurons in macaque area MT
Nat. Neurosci.
Combining feature selection and integration—a neural model for MT motion selectivity
PLoS ONE
A neural-based code for computing image velocity from small sets of middle temporal (MT/V5) neuron inputs
J. Vis.
Velocity computation in the primate visual system
Nat. Rev. Neurosci.
Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters
J. Opt. Soc. Am. A
Cited by (12)
STC-Flow: Spatio-temporal context-aware optical flow estimation
2021, Signal Processing: Image CommunicationCitation Excerpt :Bio-inspired motion estimation. Bio-inspired motion estimation [32–36] allows to explain and understand human behavior on optical flow, which estimates motion along the visual dorsal pathway in primates. Pauwels et al. [32] emulate large parts of the dorsal stream in an abstract way and implement an architecture with optical flow feature extraction stages, which are used to reliably extract moving objects in real time.
Neural implementation of categorization in a motion discrimination task
2016, NeurocomputingCitation Excerpt :Since the coherent motion direction in the first display changes trial by trial, the neural system needs to flexibly categorize the motion direction in the second display, i.e., the report may be different given one fixed test direction but different references. Electrophysiological studies have revealed that: (1) neurons in the middle temporal cortex (MT) encode the motion direction [28–31]; (2) neurons in the supplementary eye field (SEF) memorize the direction during the delay period [32,33]; and (3) neurons in lateral intraparietal area (LIP) give rise to the categorical response [34–38]. These experimental observations provide information about the encoding of motion direction and the final report stage, but we still do not know how the neural system compare the stimulus with a variable memorized reference to form the final categorization.
A new discovery on visual information dynamic changes from V1 to V2: corner encoding
2021, Nonlinear DynamicsResearch of a Neuron Model with Signal Accumulation for Motion Detection
2021, Proceedings - 3rd International Conference "Neurotechnologies and Neurointerfaces", CNN 2021Recognition of deaf gestures based on a bio-inspired neural network
2020, Journal of Physics: Conference SeriesSTDP Training of Hierarchical Spike Timing Model of Visual Information Processing
2020, Proceedings of the International Joint Conference on Neural Networks
Manuela Chessa is a Post doctoral Research scientist at the Department of Informatics, Bioengineering, Robotics and System Engineering of the University of Genoa. She was born in Genoa, Italy, on October 2nd, 1980. She received her MSc in Bioengineering with full marks from the University of Genoa, Italy, in 2005, and the Ph.D. in Bioengineering from University of Genoa in 2009. Her research interests are focused on the study of biological and artificial vision systems, on the development of bioinspired models, and of natural human–machine interfaces based on virtual, augmented and mixed reality. She has been involved in several national and international research projects. She is an author and a co-author of 38 peer reviewed scientific papers, both on ISI journal and on International Conferences, of 6 book chapters, and of 2 edited books. Moreover, she has been a tutor or a supervisor of several B.Sc. and M.Sc. theses, and she teaches “Software Technologies for Human–Computer Interaction” for the M.Sc. degree in Bioengineering at University of Genoa.
Silvio P. Sabatini received the Laurea Degree in Electronics Engineering and the Ph.D. in Computer Science from the University of Genoa in 1992 and 1996. He is currently an Associate Professor of Bioengineering at the Department of Informatics, Bioengineering, Robotics and System Engineering of the University of Genoa, Coordinator of the B.Sc. and M.Sc. programs in Biomedical engineering and Bioengineering and Member of the Board of the Doctoral Course in Bioengineering and Robotics. In 1995, he promoted the creation of the “Physical Structure of Perception and Computation” (www.pspc.unige.it) lab to develop models that capture the “physicalist” nature of the information processing occurring in the visual cortex, to understand the signal processing strategies adopted by the brain, and to build novel algorithms and architectures for artificial perception machines. His research interests relate to visual coding and multidimensional signal representation, neuromorphic computing, early-cognitive models for visually-guided behavior, and robot vision. He recently coordinated the EU FP7 project EYESHOTS on the structural mechanisms of visuospatial cognition, responsible for interacting in the 3D peripersonal space, and he was active in promoting and participating, as a partner, to several other EU FP5, FP6 and FP7 projects: ECOVISION, DRIVSCO, MCCOOP, SEARISE. He is the author of more than 100 papers in peer-reviewed journals, book chapters and international conference proceedings.
Fabio Solari received the degree in Electronic Engineering from the University of Genoa, Italy, in 1995. In 1999 he obtained his Ph.D. in Electronic Engineering and Computer Science from the same University. Since 2005, he has been appointed as an Assistant Professor of Computer Science at the Faculty of Engineering of the University of Genoa. His research activity concerns the study of visual perception with the aim to develop computational models of neural processing, to devise novel bio-inspired computer vision algorithms, and to design virtual and mixed reality environments for ecological visual stimulations. He has participated to 5 European projects: FP7-ICT, EYESHOTS and SEARISE; FP6-IST-FET, DRIVSCO; FP6-NEST, MCCOOP; FP5-IST-FET, ECOVISION. He teaches “Software Technologies for Human–Computer Interaction” for the M.Sc. degree in Bioengineering, and “Computer Vision” for the European Master on Advanced Robotics at University of Genoa.