Skip to main content

2007 | Buch

Dynamical Vision

ICCV 2005 and ECCV 2006 Workshops, WDV 2005 and WDV 2006, Beijing, China, October 21, 2005, Graz, Austria, May 13, 2006. Revised Papers

herausgegeben von: René Vidal, Anders Heyden, Yi Ma

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

Classical multiple-view geometry studies the reconstruction of a static scene - served by a rigidly moving camera. However, in many real-world applications the scene may undergo much more complex dynamical changes. For instance, the scene may consist of multiple moving objects (e.g., a tra?c scene) or arti- lated motions (e.g., a walking human) or even non-rigid dynamics (e.g., smoke, ?re, or a waterfall). In addition, some applications may require interaction with the scene through a dynamical system (e.g., vision-guided robot navigation and coordination). To study the problem of reconstructing dynamical scenes, many new al- braic, geometric, statistical, and computational tools have recently emerged in computer vision, computer graphics, image processing, and vision-based c- trol. The goal of the International Workshop on Dynamical Vision (WDV) is to converge di?erent aspects of the research on dynamical vision and to identify common mathematical problems, models, and methods for future research in this emerging and active area.

Inhaltsverzeichnis

Frontmatter

Motion Segmentation and Estimation

The Space of Multibody Fundamental Matrices: Rank, Geometry and Projection
Abstract
We study the rank and geometry of the multibody fundamental matrix, a geometric entity characterizing the two-view geometry of dynamic scenes consisting of multiple rigid-body motions. We derive an upper bound on the rank of the multibody fundamental matrix that depends on the number of independent translations. We also derive an algebraic characterization of the SVD of a multibody fundamental matrix in the case of two or odd number of rigid-body motions with a common rotation. This characterization allows us to project an arbitrary matrix onto the space of multibody fundamental matrices using linear algebraic techniques.
Xiaodong Fan, René Vidal
Direct Segmentation of Multiple 2-D Motion Models of Different Types
Abstract
We propose a closed form solution for segmenting mixtures of 2-D translational and 2-D affine motion models directly from the image intensities. Our approach exploits the fact that the spatial-temporal image derivatives generated by a mixture of these motion models must satisfy a bi-homogeneous polynomial called the multibody brightness constancy constraint (MBCC). We show that the degrees of the MBCC are related to the number of motions models of each kind. Such degrees can be automatically computed using a one-dimensional search. We then demonstrate that a sub-matrix of the Hessian of the MBCC encodes information about the type of motion models. For instance, the matrix is rank-1 for 2-D translational models and rank-3 for 2-D affine models. Once the type of motion model has been identified, one can obtain the parameters of each type of motion model at every image measurement from the cross products of the derivatives of the MBCC. We then demonstrate that accounting for a 2-D translational motion model as a 2-D affine one would result in erroneous estimation of the motion models, thus motivating our aim to account for different types of motion models. We apply our method to segmenting various dynamic scenes.
Dheeraj Singaraju, René Vidal
Motion Segmentation Using an Occlusion Detector
Abstract
We present a novel method for the detection of motion boundaries in a video sequence based on differential properties of the spatio-temporal domain. Regarding the video sequence as a 3D spatio-temporal function, we consider the second moment matrix of its gradients (averaged over a local window), and show that the eigenvalues of this matrix can be used to detect occlusions and motion discontinuities. Since these cannot always be determined locally (due to false corners and the aperture problem), a scale-space approach is used for extracting the location of motion boundaries. A closed contour is then constructed from the most salient boundary fragments, to provide the final segmentation. The method is shown to give good results on pairs of real images taken in general motion. We use synthetic data to show its robustness to high levels of noise and illumination changes; we also include cases where no intensity edge exists at the location of the motion boundary, or when no parametric motion model can describe the data.
Doron Feldman, Daphna Weinshall
Robust 3D Segmentation of Multiple Moving Objects Under Weak Perspective
Abstract
A scene containing multiple independently moving, possibly occluding, rigid objects is considered under the weak perspective camera model. We obtain a set of feature points tracked across a number of frames and address the problem of 3D motion segmentation of the objects in presence of measurement noise and outliers. We extend the robust structure from motion (SfM) method [5] to 3D motion segmentation and apply it to realistic, contaminated tracking data with occlusion. A number of approaches to 3D motion segmentation have already been proposed [3,6,14,15]. However, most of them were not developed for, and tested on, noisy and outlier-corrupted data that often occurs in practice. Due to the consistent use of robust techniques at all critical steps, our approach can cope with such data, as demonstrated in a number of tests with synthetic and real image sequences.
Levente Hajder, Dmitry Chetverikov
Nonparametric Estimation of Multiple Structures with Outliers
Abstract
Common problem encountered in the analysis of dynamic scene is the problem of simultaneous estimation of the number of models and their parameters. This problem becomes difficult as the measurement noise in the data increases and the data are further corrupted by outliers. This is especially the case in a variety of motion estimation problems, where the displacement between the views is large and the process of establishing correspondences is difficult. In this paper we propose a novel nonparametric sampling based method for estimating the number of models and their parameters. The main novelty of the proposed method lies in the analysis of the distribution of residuals of individual data points with respect to the set of hypotheses, generated by a RANSAC-like sampling process. We will show that the modes of the residual distributions directly reveal the presence of multiple models and facilitate the recovery of the individual models, without making any assumptions about the distribution of the outliers or the noise process. The proposed approach is capable of handling data with a large fraction of outliers. Experiments with both synthetic data and image pairs related by different motion models are presented to demonstrate the effectiveness of the proposed approach.
Wei Zhang, Jana Kǒsecká

Human Motion Analysis, Tracking and Recognition

Articulated Motion Segmentation Using RANSAC with Priors
Abstract
Articulated motions are partially dependent. Most of the existing segmentation methods, e.g. Costeira and Kanade[2], can not be applied to articulated motions.
We propose a novel algorithm for articulated motion segmentation called RANSAC with priors. It does not require prior knowledge of the number of articulated parts. It is both robust and efficient. Its robustness comes from its RANSAC nature. Its efficiency is due to the priors, which are derived from the spectral affinities between every pair of trajectories.
We test our algorithm with synthetic and real data. In some highly challenging case, where other motion segmentation algorithms may fail, our algorithm still achieves robust results.
Though our algorithm is inspired by articulated motions, it also applies to independent motions which can be regarded as a special case and treated uniformly.
Jingyu Yan, Marc Pollefeys
Articulated-Body Tracking Through Anisotropic Edge Detection
Abstract
This paper addresses the problem of articulated motion tracking from image sequences. We describe a method that relies on both an explicit parameterization of the extremal contours and on the prediction of the human boundary edges in the image. We combine extremal contour prediction and edge detection in a non linear minimization process. The error function that measures the discrepancy between observed image edges and predicted model contours is minimized using an analytical expression of the Jacobian that maps joint velocities onto extremal contour velocities. In practice, we model people both by their geometry (truncated elliptic cones) and their articulated structure – a kinematic model with 40 rotational degrees of freedom. To overcome the flaws of standard edge detection, we introduce a model-based anisotropic Gaussian filter. The parameters of the anisotropic Gaussian are automatically derived from the kinematic model through the prediction of the extremal contours. The theory is validated by performing full body motion capture from six synchronized video sequences at 30 fps without markers.
David Knossow, Joost van de Weijer, Radu Horaud, Rémi Ronfard
Homeomorphic Manifold Analysis: Learning Decomposable Generative Models for Human Motion Analysis
Abstract
If we consider the appearance of human motion such as gait, facial expression and gesturing, most of such activities result in nonlinear manifolds in the image space. Although the intrinsic body configuration manifolds might be very low in dimensionality, the resulting appearance manifold is challenging to model given various aspects that affects the appearance such as the view point, the person shape and appearance, etc. In this paper we learn decomposable generative models that explicitly decompose the intrinsic body configuration as a function of time from other conceptually orthogonal aspects that affects the appearance such as the view point, the person performing the action, etc. The frameworks is based on learning nonlinear mappings from a conceptual representation of the motion manifold that is homeomorphic to the actual manifold and decompose other sources of variation in the mapping coefficient space.
Chan-Su Lee, Ahmed Elgammal
View-Invariant Modeling and Recognition of Human Actions Using Grammars
Abstract
In this paper, we represent human actions as sentences generated by a language built on atomic body poses or phonemes. The knowledge of body pose is stored only implicitly as a set of silhouettes seen from multiple viewpoints; no explicit 3D poses or body models are used, and individual body parts are not identified. Actions and their constituent atomic poses are extracted from a set of multiview multiperson video sequences by an automatic keyframe selection process, and are used to automatically construct a probabilistic context-free grammar (PCFG), which encodes the syntax of the actions. Given a new single viewpoint video, we can parse it to recognize actions and changes in viewpoint simultaneously. Experimental results are provided.
Abhijit S. Ogale, Alap Karapurkar, Yiannis Aloimonos

Dynamic Textures

Segmenting Dynamic Textures with Ising Descriptors, ARX Models and Level Sets
Abstract
We present a new algorithm for segmenting a scene consisting of multiple moving dynamic textures. We model the spatial statistics of a dynamic texture with a set of second order Ising descriptors whose temporal evolution of is governed by an AutoRegressive eXogenous (ARX) model. Given this model, we cast the dynamic texture segmentation problem in a variational framework in which we minimize the spatial-temporal variance of the stochastic part of the model. This energy functional is shown to depend explicitly on both the appearance and dynamics of the scene. Our framework naturally handles intensity and texture based image segmentation as well as dynamics based video segmentation as particular cases. Several experiments show the applicability of our method to segmenting scenes using only dynamics, only appearance, and both dynamics and appearance.
Atiyeh Ghoreyshi, René Vidal
Spatial Segmentation of Temporal Texture Using Mixture Linear Models
Abstract
In this paper we propose a novel approach for the spatial segmentation of video sequences containing multiple temporal textures. This work is based on the notion that a single temporal texture can be represented by a low-dimensional linear model. For scenes containing multiple temporal textures, e.g. trees swaying adjacent a flowing river, we extend the single linear model to a mixture of linear models and segment the scene by identifying subspaces within the data using robust generalized principal component analysis (GPCA). Computation is reduced to minutes in Matlab by first identifying models from a sampling of the sequence and using the derived models to segment the remaining data. The effectiveness of our method has been demonstrated in several examples including an application in biomedical image analysis.
Lee Cooper, Jun Liu, Kun Huang
Online Video Registration of Dynamic Scenes Using Frame Prediction
Abstract
An online approach is proposed for Video registration of dynamic scenes, such as scenes with dynamic textures, moving objects, motion parallax, etc. This approach has three steps: (i) Assume that a few frames are already registered. (ii) Using the registered frames, the next frame is predicted. (iii) A new video frame is registered to the predicted frame.
Frame prediction overcomes the bias introduced by dynamics in the scene, even when dynamic objects cover the majority of the image. It can also overcome many systematic changes in intensity, and the “brightness constancy” is replaced with “dynamic constancy”.
This predictive online approach can also be used with motion parallax, where non uniform image motion is caused by camera translation in a 3D scene with large depth variations. In this case a method to compute the camera ego motion is described.
Alex Rav-Acha, Yael Pritch, Shmuel Peleg
Dynamic Texture Recognition Using Volume Local Binary Patterns
Abstract
Dynamic texture is an extension of texture to the temporal domain. Description and recognition of dynamic textures has attracted growing attention. In this paper, a new method for recognizing dynamic textures is proposed. The textures are modeled with volume local binary patterns (VLBP), which are an extension of the LBP operator widely used in still texture analysis, combining the motion and appearance together. A rotation invariant VLBP is also proposed. Our approach has many advantages compared with the earlier approaches, providing a better performance for two test databases. Due to its rotation invariance and robustness to gray-scale variations, the method is very promising for practical applications.
Guoying Zhao, Matti Pietikäinen

Motion Tracking

A Rao-Blackwellized Parts-Constellation Tracker
Abstract
We present a method for efficiently tracking objects represented as constellations of parts by integrating out the shape of the model. Parts-based models have been successfully applied to object recognition and tracking. However, the high dimensionality of such models present an obstacle to traditional particle filtering approaches. We can efficiently use parts-based models in a particle filter by applying Rao-Blackwellization to integrate out continuous parameters such as shape. This allows us to maintain multiple hypotheses for the pose of an object without the need to sample in the high-dimensional spaces in which parts-based models live. We present experimental results for a challenging biological tracking task.
Grant Schindler, Frank Dellaert
Bayesian Tracking with Auxiliary Discrete Processes. Application to Detection and Tracking of Objects with Occlusions
Abstract
A number of Bayesian tracking models involve auxiliary discrete variables beside the main hidden state of interest. These discrete variables usually follow a Markovian process and interact with the hidden state either via its evolution model or via the observation process, or both. We consider here a general model that encompasses all these situations, and show how Bayesian filtering can be rigorously conducted with it. The resulting approach facilitates easy re-use of existing tracking algorithms designed in the absence of the auxiliary process. In particular we show how particle filters can be obtained based on sampling only in the original state space instead of sampling in the augmented space, as it is usually done. We finally demonstrate how this framework facilitates solutions to the critical problem of appearance and disappearance of targets, either upon scene entering and exiting, or due to temporary occlusions. This is illustrated in the context of color-based tracking with particle filters.
Patrick Pérez, Jaco Vermaak
Tracking of Multiple Objects Using Optical Flow Based Multiscale Elastic Matching
Abstract
A novel hybrid region-based and contour-based multiple object tracking model using optical flow based elastic matching is proposed. The proposed elastic matching model is general in two significant ways. First, it is suitable for tracking of both, rigid and deformable objects. Second, it is suitable for tracking using both, fixed cameras and moving cameras since the model does not rely on background subtraction. The elastic matching algorithm exploits both, the spectral features and contour-based features of the tracked objects, making it more robust and general in the context of object tracking. The proposed elastic matching algorithm uses a multiscale optical flow technique to compute the velocity field. This prevents the multiscale elastic matching algorithm from being trapped in a local optimum unlike conventional elastic matching algorithms that use a heuristic search procedure in the matching process. The proposed elastic matching based tracking framework is combined with Kalman filter in our current experiments. The multiscale elastic matching algorithm is used to compute the velocity field which is then approximated using B-spline surfaces. The control points of the B-spline surfaces are used directly as the tracking variables in a Kalman filtering model. The B-spline approximation of the velocity field is used to update the spectral features of the tracked objects in the Kalman filter model. The dynamic nature of these spectral features are subsequently used to reason about occlusion. Experimental results on tracking of multiple objects in real-time video are presented.
Xingzhi Luo, Suchendra M. Bhandarkar
Real-Time Tracking with Classifiers
Abstract
Two basic facts motivate this paper: (1) particle filter based trackers have become increasingly powerful in recent years, and (2) object detectors using statistical learning algorithms often work at a near real-time rate.
We present the use of classifiers as likelihood observation function of a particle filter. The original resulting method is able to simultaneously recognize and track an object using only a statistical model learnt from a generic database.
Our main contribution is the definition of a likelihood function which is produced directly from the outputs of a classifier. This function is an estimation of calibrated probabilities P(class|data). Parameters of the function are estimated to minimize the negative log likelihood of the training data, which is a cross-entropy error function.
Since a generic statistical model is used, the tracking does not need any image based model learnt inline. Moreover, the tracking is robust to appearance variation because the statistical learning is trained with many poses, illumination conditions and instances of the object.
We have implemented the method for two recent popular classifiers: (1) Support Vector Machines and (2) Adaboost. An experimental evaluation shows that the approach can be used for popular applications like pedestrian or vehicle detection and tracking.
Finally, we demonstrate that an efficient implementation provides a real-time system on which only a fraction of CPU time is required to track at frame rate.
Thierry Chateau, Vincent Gay-Belille, Frederic Chausse, Jean-Thierry Lapresté

Rigid and Non-rigid Motion Analysis

A Probabilistic Framework for Correspondence and Egomotion
Abstract
This paper is an argument for two assertions: First, that by representing correspondence probabilistically, drastically more correspondence information can be extracted from images. Second, that by increasing the amount of correspondence information used, more accurate egomotion estimation is possible. We present a novel approach illustrating these principles.
We first present a framework for using Gabor filters to generate such correspondence probability distributions. Essentially, different filters ’vote’ on the correct correspondence in a way giving their relative likelihoods. Next, we use the epipolar constraint to generate a probability distribution over the possible motions. As the amount of correspondence information is increased, the set of motions yielding significant probabilities is shown to ’shrink’ to the correct motion.
Justin Domke, Yiannis Aloimonos
Estimating the Pose of a 3D Sensor in a Non-rigid Environment
Abstract
Estimating the pose of an imaging sensor is a central research problem. Many solutions have been proposed for the case of a rigid environment. In contrast, we tackle the case of a non-rigid environment observed by a 3D sensor, which has been neglected in the literature. We represent the environment as sets of time-varying 3D points explained by a low-rank shape model, that we derive in its implicit and explicit forms. The parameters of this model are learnt from data gathered by the 3D sensor. We propose a learning algorithm based on minimal 3D non-rigid tensors that we introduce. This is followed by a Maximum Likelihood nonlinear refinement performed in a bundle adjustment manner. Given the learnt environment model, we compute the pose of the 3D sensor, as well as the deformations of the environment, that is, the non-rigid counterpart of pose, from new sets of 3D points. We validate our environment learning and pose estimation modules on simulated and real data.
Adrien Bartoli
A Batch Algorithm for Implicit Non-rigid Shape and Motion Recovery
Abstract
The recovery of 3D shape and camera motion for non-rigid scenes from single-camera video footage is a very important problem in computer vision. The low-rank shape model consists in regarding the deformations as linear combinations of basis shapes. Most algorithms for reconstructing the parameters of this model along with camera motion are based on three main steps. Given point tracks and the rank, or equivalently the number of basis shapes, they factorize a measurement matrix containing all point tracks, from which the camera motion and basis shapes are extracted and refined in a bundle adjustment manner. There are several issues that have not been addressed yet, among which, choosing the rank automatically and dealing with erroneous point tracks and missing data.
We introduce theoretical and practical contributions that address these issues. We propose an implicit imaging model for non-rigid scenes from which we derive non-rigid matching tensors and closure constraints. We give a non-rigid Structure-From-Motion algorithm based on computing matching tensors over subsequences, from which the implicit cameras are extrated. Each non-rigid matching tensor is computed, along with the rank of the subsequence, using a robust estimator incorporating a model selection criterion that detects erroneous image points.
Preliminary experimental results on real and simulated data show that our algorithm deals with challenging video sequences.
Adrien Bartoli, Søren I. Olsen

Motion Filtering and Vision-Based Control

Using a Connected Filter for Structure Estimation in Perspective Systems
Abstract
Three-dimensional structure information can be estimated from two-dimensional perspective images using recursive estimation methods. This paper investigates possibilities to improve structure filter performance for a certain class of stochastic perspective systems by utilizing mutual information, in particular when each observed point on a rigid object is affected by the same process noise. After presenting the dynamic system of interest, the method is applied, using an extended Kalman filter for the estimation, to a simulated time-varying multiple point vision system. The performance of a connected filter is compared, using Monte Carlo methods, to that of a set of independent filters. The idea is then further illustrated and analyzed by means of a simple linear system. Finally more formal stochastic differential equation aspects, especially the impact of transformations in the Itô sense, are discussed and related to physically realistic noise models in vision systems.
Fredrik Nyberg, Ola Dahl, Jan Holst, Anders Heyden
Recursive Structure from Motion Using Hybrid Matching Constraints with Error Feedback
Abstract
We propose an algorithm for recursive estimation of structure and motion in rigid body perspective dynamic systems, based on the novel concept of continuous-differential matching constraints for the estimation of the velocity parameters. The parameter estimation procedure is fused with a continuous-discrete extended Kalman filter for the state estimation. Also, the structure and motion estimation processes are connected by a reprojection error constraint, where feedback of the structure estimates is used to recursively obtain corrections to the motion parameters, leading to more accurate estimates and a more robust performance of the method. The main advantages of the presented algorithm are that after initialization, only three observed object point correspondences between consecutive pairs of views are required for the sequential motion estimation, and that both the parameter update and the correction step are performed using linear constraints only. Simulated experiments are provided to demonstrate the performance of the method.
Fredrik Nyberg, Anders Heyden
Force/Vision Based Active Damping Control of Contact Transition in Dynamic Environments
Abstract
When a manipulator interacts with objects with poorly damped oscillatory modes, undesired oscillations and bouncing may result. In this paper, we present a method for observer-based control of a rigid manipulator interacting with an environment with linear dynamics. The controller injects a desired damping into the environment dynamics, using both visual- and force sensing for stable control of the contact transition. Stability of the system is shown using an observer-based backstepping design method, and simulations and experiments are performed in order to validate the chosen approach.
Tomas Olsson, Rolf Johansson, Anders Robertsson
Segmentation and Guidance of Multiple Rigid Objects for Intra-operative Endoscopic Vision
Abstract
This paper presents an endoscopic vision framework for model-based 3D guidance of surgical instruments used in robotized laparoscopic surgery. In order to develop such a system, a variety of challenging segmentation, tracking and reconstruction problems must be solved. With this minimally invasive surgical technique, every single instrument has to pass through an insertion point in the abdominal wall and is mounted on the end-effector of a surgical robot which can be controlled by automatic visual feedback. The motion of any laparoscopic instrument is then constrained and the goal of the automated task is to safety bring instruments at desired locations while avoiding undesirable contact with internal organs. For this ”eye-to-hands” configuration with a stationary camera, most control strategies require the knowledge of the out-of-field of view insertion points location and we demonstrate it can be achieved in vivo thanks to a sequence of (instrument) motions without markers and without the need of an external measurement device. In so doing, we firstly present a real-time region-based color segmentation which integrates this motion constraint to initiate the search for region seeds. Secondly, a novel pose algorithm for the wide class of cylindrical-shaped instruments is developed which can handle partial occlusions as it is often the case in the abdominal cavity. The foreseen application is a good training ground to evaluate the robustness of segmentation algorithms and positioning techniques since main difficulties came from the scene understanding and its dynamical variations. Experiments in the lab and in real surgical conditions have been conducted. The experimental validation is demonstrated through the 3D positioning of instruments’ axes (4 DOFs) which must lead to motionless insertion points disturbed by the breathing motion.
C. Doignon, F. Nageotte, M. de Mathelin
Backmatter
Metadaten
Titel
Dynamical Vision
herausgegeben von
René Vidal
Anders Heyden
Yi Ma
Copyright-Jahr
2007
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-70932-9
Print ISBN
978-3-540-70931-2
DOI
https://doi.org/10.1007/978-3-540-70932-9