Skip to main content

2002 | Buch

Computer Vision — ECCV 2002

7th European Conference on Computer Vision Copenhagen, Denmark, May 28–31, 2002 Proceedings, Part I

herausgegeben von: Anders Heyden, Gunnar Sparr, Mads Nielsen, Peter Johansen

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

Premiering in 1990 in Antibes, France, the European Conference on Computer Vision, ECCV, has been held biennially at venues all around Europe. These conferences have been very successful, making ECCV a major event to the computer vision community. ECCV 2002 was the seventh in the series. The privilege of organizing it was shared by three universities: The IT University of Copenhagen, the University of Copenhagen, and Lund University, with the conference venue in Copenhagen. These universities lie ¨ geographically close in the vivid Oresund region, which lies partly in Denmark and partly in Sweden, with the newly built bridge (opened summer 2000) crossing the sound that formerly divided the countries. We are very happy to report that this year’s conference attracted more papers than ever before, with around 600 submissions. Still, together with the conference board, we decided to keep the tradition of holding ECCV as a single track conference. Each paper was anonymously refereed by three different reviewers. For the ?nal selection, for the ?rst time for ECCV, a system with area chairs was used. These met with the program chairsinLundfortwodaysinFebruary2002toselectwhatbecame45oralpresentations and 181 posters.Also at this meeting the selection was made without knowledge of the authors’identity.

Inhaltsverzeichnis

Frontmatter

Active and Real-Time Vision

Tracking with the EM Contour Algorithm

A novel active-contour method is presented and applied to pose refinement and tracking. The main innovation is that no ”features” are detected at any stage: contours are simply assumed to remove statistical dependencies between pixels on opposite sides of the contour. This assumption, together with a simple model of shape variability of the geometric models, leads to the application of an EM method for maximizing the likelihood of pose parameters. In addition, a dynamical model of the system leads to the application of a Kalman filter. The method is demonstrated by tracking motor vehicles with 3-D models.

Arthur E. C. Pece, Anthony D. Worrall
M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene Using Region-Based Stereo

We present a system that is capable of segmenting, detecting and tracking multiple people in a cluttered scene using multiple synchronized cameras located far from each other. The system improves upon existing systems in many ways including: (1) We do not assume that a foreground connected component belongs to only one object; rather, we segment the views taking into account color models for the objects and the background. This helps us to not only separate foreground regions belonging to different objects, but to also obtain better background regions than traditional background subtraction methods (as it uses foreground color models in the algorithm). (2) It is fully automatic and does not require any manual input or initializations of any kind. (3) Instead of taking decisions about object detection and tracking from a single view or camera pair, we collect evidences from each pair and combine the evidence to obtain a decision in the end. This helps us to obtain much better detection and tracking as opposed to traditional systems.Several innovations help us tackle the problem. The first is the introduction of a region-based stereo algorithm that is capable of finding 3D points inside an object if we know the regions belonging to the object in two views. No exact point matching is required. This is especially useful in wide baseline camera systems where exact point matching is very difficult due to self-occlusion and a substantial change in viewpoint. The second contribution is the development of a scheme for setting priors for use in segmentation of a view using bayesian classification. The scheme, which assumes knowledge of approximate shape and location of objects, dynamically assigns priors for different objects at each pixel so that occlusion information is encoded in the priors. The third contribution is a scheme for combining evidences gathered from different camera pairs using occlusion analysis so as to obtain a globally optimum detection and tracking of objects.The system has been tested using different density of people in the scene which helps us to determine the number of cameras required for a particular density of people.

Anurag Mittal, Larry S. Davis

Image Features

Analytical Image Models and Their Applications

In this paper, we study a family of analytical probability models for images within the spectral representation framework. First the input image is decomposed using a bank of filters, and probability models are imposed on the filter outputs (or spectral components). A two-parameter analytical form, called a Bessel K form, derived based on a generator model, is used to model the marginal probabilities of these spectral components. The Bessel K parameters can be estimated efficiently from the filtered images and extensive simulations using video, infrared, and range images have demonstrated Bessel K form’s fit to the observed histograms. The effectiveness of Bessel K forms is also demonstrated through texture modeling and synthesis. In contrast to numeric-based dimension reduction representations, which are derived purely based on numerical methods, the Bessel K representations are derived based on object representations and this enables us to establish relationships between the Bessel parameters and certain characteristics of the imaged objects. We have derived a pseudometric on the image space to quantify image similarities/differences using an analytical expression for L2-metric on the set of Bessel K forms. We have applied the Bessel K representation to texture modeling and synthesis, clutter classification, pruning of hypotheses for object recognition, and object classification. Results show that Bessel K representation captures important image features, suggesting its role in building efficient image understanding paradigms and systems.

Anuj Srivastava, Xiuwen Liu, Ulf Grenander
Time-Recursive Velocity-Adapted Spatio-Temporal Scale-Space Filters

This paper presents a theory for constructing and computing velocity-adapted scale-space filters for spatio-temporal image data. Starting from basic criteria in terms of time-causality, time-recursivity, locality and adaptivity with respect to motion estimates, a family of spatio-temporal recursive filters is proposed and analysed. An important property of the proposed family of smoothing kernels is that the spatio-temporal covariance matrices of the discrete kernels obey similar transformation properties under Galilean transformations as for continuous smoothing kernels on continuous domains. Moreover, the proposed theory provides an efficient way to compute and generate non-separable scale-space representations without need for explicit external warping mechanisms or keeping extended temporal buffers of the past. The approach can thus be seen as a natural extension of recursive scale-space filters from pure temporal data to spatio-temporal domains.

Tony Lindeberg
Combining Appearance and Topology for Wide Baseline Matching

The problem of establishing image-to-image correspondences is fundamental in computer vision. Recently, several wide baseline matching algorithms capable of handling large changes of viewpoint have appeared. By computing feature values from image data, these algorithms mainly use appearance as a cue for matching. Topological information, i.e. spatial relations between features, has also been used, but not nearly to the same extent as appearance. In this paper, we incorporate topological constraints into an existing matching algorithm [1] which matches image intensity profiles between interest points. We show that the algorithm can be improved by exploiting the constraint that the intensity profiles around each interest point should be cyclically ordered. String matching techniques allows for an efficient implementation of the ordering constraint. Experiments with real data indicate that the modified algorithm indeed gives superior results to the original one. The method of enforcing the spatial constraints is not limited to the presented case, but can be used on any algorithm where interest point correspondences are sought.

Dennis Tell, Stefan Carlsson
Guided Sampling and Consensus for Motion Estimation

We present techniques for improving the speed of robust motion estimation based on random sampling of image features. Starting from Torr and Zisserman’s MLESAC algorithm, we address some of the problems posed from both practical and theoretical standpoints and in doing so allow the random search to be replaced by a guided search. Guidance of the search is based on readily-available information which is usually discarded, but can significantly reduce the search time. This guided-sampling algorithm is further specialised for tracking of multiple motions, for which results are presented.

Ben Tordoff, David W Murray

Image Features / Visual Motion

Fast Anisotropic Gauss Filtering

We derive the decomposition of the anisotropic Gaussian in a one dimensional Gauss filter in the x-direction followed by a one dimensional filter in a non-orthogonal direction ϕ. So also the anisotropic Gaussian can be decomposed by dimension. This appears to be extremely efficient from a computing perspective. An implementation scheme for normal convolution and for recursive filtering is proposed. Also directed derivative filters are demonstrated.For the recursive implementation, filtering an 512 × 512 image is performed within 65 msec, independent of the standard deviations and orientation of the filter. Accuracy of the filters is still reasonable when compared to truncation error or recursive approximation error.The anisotropic Gaussian filtering method allows fast calculation of edge and ridge maps, with high spatial and angular accuracy. For tracking applications, the normal anisotropic convolution scheme is more advantageous, with applications in the detection of dashed lines in engineering drawings. The recursive implementation is more attractive in feature detection applications, for instance in affine invariant edge and ridge detection in computer vision. The proposed computational filtering method enables the practical applicability of orientation scale-space analysis.

Jan-Mark Geusebroek, Arnold W. M. Smeulders, Joost van de Weijer
Adaptive Rest Condition Potentials: Second Order Edge-Preserving Regularization

The propose of this paper is to introduce a new regularization formulation for inverse problems in computer vision and image processing that allows one to reconstruct second order piecewise smooth images, that is, images consisting of an assembly of regions with almost constant value, almost constant slope or almost constant curvature. This formulation is based on the idea of using potential functions that correspond to springs or thin plates with an adaptive rest condition. Efficient algorithms for computing the solution, and examples illustrating the performance of this scheme, compared with other known regularization schemes are presented as well.

Mariano Rivera, Jose L. Marroquin
An Affine Invariant Interest Point Detector

This paper presents a novel approach for detecting affine invariant interest points. Our method can deal with significant affine transformations including large scale changes. Such transformations introduce significant changes in the point location as well as in the scale and the shape of the neighbourhood of an interest point. Our approach allows to solve for these problems simultaneously. It is based on three key ideas: 1) The second moment matrix computed in a point can be used to normalize a region in an affine invariant way (skew and stretch). 2) The scale of the local structure is indicated by local extrema of normalized derivatives over scale. 3) An affine-adapted Harris detector determines the location of interest points. A multi-scale version of this detector is used for initialization. An iterative algorithm then modifies location, scale and neighbourhood of each point and converges to affine invariant points. For matching and recognition, the image is characterized by a set of affine invariant points; the affine transformation associated with each point allows the computation of an affine invariant descriptor which is also invariant to affine illumination changes. A quantitative comparison of our detector with existing ones shows a significant improvement in the presence of large affine deformations. Experimental results for wide baseline matching show an excellent performance in the presence of large perspective transformations including significant scale changes. Results for recognition are very good for a database with more than 5000 images.

Krystian Mikolajczyk, Cordelia Schmid
Understanding and Modeling the Evolution of Critical Points under Gaussian Blurring

In order to investigate the deep structure of Gaussian scale space images, one needs to understand the behaviour of critical points under the influence of parameter-driven blurring. During this evolution two different types of special points are encountered, the so-called scale space saddles and the catastrophe points, the latter describing the pairwise annihilation and creation of critical points. The mathematical framework of catastrophe theory is used to model non-generic events that might occur due to e.g. local symmetries in the image. It is shown how this knowledge can be exploited in conjunction with the scale space saddle points, yielding a scale space hierarchy tree that can be used for segmentation. Furthermore the relevance of creations of pairs of critical points with respect to the hierarchy is discussed. We clarify the theory with an artificial image and a simulated MR image.

Arjan Kuijper, Luc Florack
Image Processing Done Right

A large part of “image processing” involves the computation of significant points, curves and areas (“features”). These can be defined as loci where absolute differential invariants of the image assume fiducial values, taking spatial scale and intensity (in a generic sense) scale into account. “Differential invariance” implies a group of “similarities” or “congruences”. These “motions” define the geometrical structure of image space. Classical Euclidian invariants don’t apply to images because image space is non-Euclidian. We analyze image structure from first principles and construct the fundamental group of image space motions. Image space is a Cayley-Klein geometry with one isotropic dimension. The analysis leads to a principled definition of “features” and the operators that define them.

Jan J. Koenderink, Andrea J. van Doorn
Multimodal Data Representations with Parameterized Local Structures

In many vision problems, the observed data lies in a nonlinear manifold in a high-dimensional space. This paper presents a generic modelling scheme to characterize the nonlinear structure of the manifold and to learn its multimodal distribution. Our approach represents the data as a linear combination of parameterized local components, where the statistics of the component parameterization describe the nonlinear structure of the manifold. The components are adaptively selected from the training data through a progressive density approximation procedure, which leads to the maximum likelihood estimate of the underlying density. We show results on both synthetic and real training sets, and demonstrate that the proposed scheme has the ability to reveal important structures of the data.

Ying Zhu, Dorin Comaniciu, Stuart Schwartz, Visvanathan Ramesh
The Relevance of Non-generic Events in Scale Space Models

In order to investigate the deep structure of Gaussian scale space images, one needs to understand the behaviour of spatial critical points under the influence of blurring. We show how the mathematical framework of catastrophe theory can be used to describe the behaviour of critical point trajectories when various different types of generic events, viz. annihilations and creations of pairs of spatial critical points, (almost) coincide. Although such events are non-generic in mathematical sense, they are not unlikely to be encountered in practice. Furthermore the behaviour leads to the observation that fine-to-coarse tracking of critical points doesn’t suffice. We apply the theory to an artificial image and a simulated MR image and show the occurrence of the described behaviour.

Arjan Kuijper, Luc Florack
The Localized Consistency Principle for Image Matching under Non-uniform Illumination Variation and Affine Distortion

This paper proposes an image matching method that is robust to illumination variation and affine distortion. Our idea is to do image matching through establishing an imaging function that describes the functional relationship relating intensity values between two images. Similar methodology has been proposed by Viola [11] and Lai & Fang [6]. Viola proposed to do image matching through establishment of an imaging function based on a consistency principle. Lai & Fang proposed a parametric form of the imaging function. In cases where the illumination variation is not globally uniform and the parametric form of imaging function is not obvious, one needs to have a more robust method. Our method aims to take care of spatially non-uniform illumination variation and affine distortion. Central to our method is the proposal of a localized consistency principle, implemented through a non-parametric way of estimating the imaging function. The estimation is effected through optimizing a similarity measure that is robust under spatially non-uniform illumination variation and affine distortion. Experimental results are presented from both synthetic and real data. Encouraging results were obtained.

Bing Wang, Kah Kay Sung, Teck Khim Ng
Resolution Selection Using Generalized Entropies of Multiresolution Histograms

The performances of many image analysis tasks depend on the image resolution at which they are applied. Traditionally, resolution selection methods rely on spatial derivatives of image intensities. Differential measurements, however, are sensitive to noise and are local. They cannot characterize patterns, such as textures, which are defined over extensive image regions. In this work, we present a novel tool for resolution selection that considers sufficiently large image regions and is robust to noise. It is based on the generalized entropies of the histograms of an image at multiple resolutions. We first examine, in general, the variation of histogram entropies with image resolution. Then, we examine the sensitivity of this variation for shapes and textures in an image. Finally, we discuss the significance of resolutions of maximum histogram entropy. It is shown that computing features at these resolutions increases the discriminability between images. It is also shown that maximum histogram entropy values can be used to improve optical flow estimates for block based algorithms in image sequences with a changing zoom factor.

Efstathios Hadjidemetriou, Michael D. Grossberg, Shree K. Nayar
Robust Computer Vision through Kernel Density Estimation

Two new techniques based on nonparametric estimation of probability densities are introduced which improve on the performance of equivalent robust methods currently employed in computer vision. The first technique draws from the projection pursuit paradigm in statistics, and carries out regression M-estimation with a weak dependence on the accuracy of the scale estimate. The second technique exploits the properties of the multivariate adaptive mean shift, and accomplishes the fusion of uncertain measurements arising from an unknown number of sources. As an example, the two techniques are extensively used in an algorithm for the recovery of multiple structures from heavily corrupted data.

Haifeng Chen, Peter Meer
Constrained Flows of Matrix-Valued Functions: Application to Diffusion Tensor Regularization

Nonlinear partial differential equations (PDE) are now widely used to regularize images. They allow to eliminate noise and artifacts while preserving large global features, such as object contours. In this context, we propose a geometric framework to design PDE flows acting on constrained datasets. We focus our interest on flows of matrix-valued functions undergoing orthogonal and spectral constraints. The corresponding evolution PDE’s are found by minimization of cost functionals, and depend on the natural metrics of the underlying constrained manifolds (viewed as Lie groups or homogeneous spaces). Suitable numerical schemes that fit the constraints are also presented. We illustrate this theoretical framework through a recent and challenging problem in medical imaging: the regularization of diffusion tensor volumes (DTMRI).

C. Chefd’hotel, D. Tschumperlé, R. Deriche, O. Faugeras
A Hierarchical Framework for Spectral Correspondence

The modal correspondence method of Shapiro and Brady aims to match point-sets by comparing the eigenvectors of a pairwise point proximity matrix. Although elegant by means of its matrix representation, the method is notoriously susceptible to differences in the relational structure of the point-sets under consideration. In this paper we demonstrate how the method can be rendered robust to structural differences by adopting a hierarchical approach. We place the modal matching problem in a probabilistic setting in which the correspondences between pairwise clusters can be used to constrain the individual point correspondences. To meet this goal we commence by describing an iterative method which can be applied to the point proximity matrix to identify the locations of pairwise modal clusters. Once we have assigned points to clusters, we compute within-cluster and between-cluster proximity matrices. The modal co-efficients for these two sets of proximity matrices are used to compute cluster correspondence and cluster-conditional point correspondence probabilities. A sensitivity study on synthetic point-sets reveals that the method is considerably more robust than the conventional method to clutter or point-set contamination.

Marco Carcassoni, Edwin R. Hancock
Phase-Based Local Features

We introduce a new type of local feature based on the phase and amplitude responses of complex-valued steerable filters. The design of this local feature is motivated by a desire to obtain feature vectors which are semi-invariant under common image deformations, yet distinctive enough to provide useful identity information. A recent proposal for such local features involves combining differential invariants to particular image deformations, such as rotation. Our approach differs in that we consider a wider class of image deformations, including the addition of noise, along with both global and local brightness variations. We use steerable filters to make the feature robust to rotation. And we exploit the fact that phase data is often locally stable with respect to scale changes, noise, and common brightness changes. We provide empirical results comparing our local feature with one based on differential invariants. The results show that our phase-based local feature leads to better performance when dealing with common illumination changes and 2-D rotation, while giving comparable effects in terms of scale changes.

Gustavo Carneiro, Allan D. Jepson
What Is the Role of Independence for Visual Recognition?

Independent representations have recently attracted significant attention from the biological vision and cognitive science communities. It has been 1) argued that properties such as sparseness and independence play a major role in visual perception, and 2) shown that imposing such properties on visual representations originates receptive fields similar to those found in human vision. We present a study of the impact of feature independence in the performance of visual recognition architectures. The contributions of this study are of both theoretical and empirical natures, and support two main conclusions. The first is that the intrinsic complexity of the recognition problem (Bayes error) is higher for independent representations. The increase can be significant, close to 10% in the databases we considered. The second is that criteria commonly used in independent component analysis are not sufficient to eliminate all the dependencies that impact recognition. In fact, “independent components” can be less independent than previous representations, such as principal components or wavelet bases.

Nuno Vasconcelos, Gustavo Carneiro
A Probabilistic Multi-scale Model for Contour Completion Based on Image Statistics

We derive a probabilistic multi-scale model for contour completion based on image statistics. The boundaries of human segmented images are used as “ground truth”. A probabilistic formulation of contours demands a prior model and a measurement model. From the image statistics of boundary contours, we derive both the prior model of contour shape and the local likelihood model of image measurements. We observe multi-scale phenomena in the data, and accordingly propose a higher-order Markov model over scales for the contour continuity prior. Various image cues derived from orientation energy are evaluated and incorporated into the measurement model. Based on these models, we have designed a multi-scale algorithm for contour completion, which exploits both contour continuity and texture. Experimental results are shown on a wide range of images.

Xiaofeng Ren, Jitendra Malik
Toward a Full Probability Model of Edges in Natural Images

We investigate the statistics of local geometric structures in natural images. Previous studies [13,14] of high-contrast 3×3 natural image patches have shown that, in the state space of these patches, we have a concentration of data points along a low-dimensional non-linear manifold that corresponds to edge structures. In this paper we extend our analysis to a filter-based multiscale image representation, namely the local 3-jet of Gaussian scale-space representations. A new picture of natural image statistics seems to emerge, where primitives (such as edges, blobs, and bars) generate low-dimensional non-linear structures in the state space of image data.

Kim S. Pedersen, Ann B. Lee
Fast Difference Schemes for Edge Enhancing Beltrami Flow

The Beltrami flow [13,14] is one of the most effective denoising algorithms in image processing. For gray-level images, we show that the Beltrami flow equation can be arranged in a reaction-diffusion form. This reveals the edge-enhancing properties of the equation and suggests the application of additive operator split (AOS) methods [4,5] for faster convergence. As we show with numerical simulations, the AOS method results in an unconditionally stable semi-implicit linearized difference scheme in 2D and 3D. The values of the edge indicator function are used from the previous step in scale, while the pixel values of the next step are used to approximate the flow. The optimum ratio between the reaction and diffusion counterparts of the governing PDE is studied, in order to achieve a better quality of segmentation. The computational time decreases by a factor of ten, as compared to the explicit scheme. For 2D color images, the Beltrami flow equations are coupled, and do not yield readily to the AOS technique. However, in the proximity of an edge, the cross-products of color gradients nearly vanish, and the coupling becomes weak. The principal directions of the edge indicator matrix are normal and tangent to the edge. Replacing the action of the matrix on the gradient vector by an action of its eigenvalue, we reduce the color problem to the gray level case with a reasonable accuracy. The scalar edge indicator function for the color case becomes essentially the same as that for the gray level image, and the fast implicit technique is implemented.

R. Malladi, I. Ravve
A Fast Radial Symmetry Transform for Detecting Points of Interest

A new feature detection technique is presented that utilises local radial symmetry to identify regions of interest within a scene. This transform is significantly faster than existing techniques using radial symmetry and offers the possibility of real-time implementation on a standard processor. The new transform is shown to perform well on a wide variety of images and its performance is tested against leading techniques from the literature. Both as a facial feature detector and as a generic region of interest detector the new transform is seen to offer equal or superior performance to contemporary techniques whilst requiring drastically less computational effort.

Gareth Loy, Alexander Zelinsky
Image Features Based on a New Approach to 2D Rotation Invariant Quadrature Filters

Quadrature filters are a well known method of low-level computer vision for estimating certain properties of the signal, as there are local amplitude and local phase. However, 2D quadrature filters suffer from being not rotation invariant. Furthermore, they do not allow to detect truly 2D features as corners and junctions unless they are combined to form the structure tensor. The present paper deals with a new 2D generalization of quadrature filters which is rotation invariant and allows to analyze intrinsically 2D signals. Hence, the new approach can be considered as the union of properties of quadrature filters and of the structure tensor. The proposed method first estimates the local orientation of the signal which is then used for steering some basis filter responses. Certain linear combination of these filter responses are derived which allow to estimate the local isotropy and two perpendicular phases of the signal. The phase model is based on the assumption of an angular band-limitation in the signal. As an application, a simple and efficient point-of-interest operator is presented and it is compared to the Plessey detector.

Michael Felsberg, Gerald Sommer
Representing Edge Models via Local Principal Component Analysis

Edge detection depends not only upon the assumed model of what an edge is, but also on how this model is represented. The problem of how to represent the edge model is typically neglected, despite the fact that the representation is a bottleneck for both computational cost and accuracy. We propose to represent edge models by a partition of the edge manifold corresponding to the edge model, where each local element of the partition is described by its principal components. We describe the construction of this representation and demonstrate its benefits for various edge models.

Patrick S. Huggins, Steven W. Zucker
Regularized Shock Filters and Complex Diffusion

We address the issue of regularizing Osher and Rudin’s shock filter, used for image deblurring, in order to allow processes that are more robust against noise. Previous solutions to the problem suggested adding some sort of diffusion term to the shock equation. We analyze and prove some properties of coupled shock and diffusion processes. Finally we propose an original solution of adding a complex diffusion term to the shock equation. This new term is used to smooth out noise and indicate inflection points simultaneously. The imaginary value, which is an approximated smoothed second derivative scaled by time, is used to control the process. This results in a robust deblurring process that performs well also on noisy signals.

Guy Gilboa, Nir A. Sochen, Yehoshua Y. Zeevi
Multi-view Matching for Unordered Image Sets, or “How Do I Organize My Holiday Snaps?”

There has been considerable success in automated reconstruction for image sequences where small baseline algorithms can be used to establish matches across a number of images. In contrast in the case of widely separated views, methods have generally been restricted to two or three views.In this paper we investigate the problem of establishing relative viewpoints given a large number of images where no ordering information is provided. A typical application would be where images are obtained from different sources or at different times: both the viewpoint (position, orientation, scale) and lighting conditions may vary significantly over the data set.Such a problem is not fundamentally amenable to exhaustive pair wise and triplet wide baseline matching because this would be prohibitively expensive as the number of views increases. Instead, we investiate how a combination of image invariants, covariants, and multiple view relations can be used in concord to enable efficient multiple view matching. The result is a matching algorithm which is linear in the number of views.The methods are illustrated on several real image data sets. The output enables an image based technique for navigating in a 3D scene, moving from one image to whichever image is the next most appropriate.

F. Schaffalitzky, A. Zisserman
Parameter Estimates for a Pencil of Lines: Bounds and Estimators

Estimating the parameters of a pencil of lines is addressed. A statistical model for the measurements is developed, from which the Cramer Rao lower bound is determined. An estimator is derived, and its performance is simulated and compared to the bound. The estimator is shown to be asymptotically efficient, and superior to the classical least squares algorithm.

Gavriel Speyer, Michael Werman
Multilinear Analysis of Image Ensembles: TensorFaces

Natural images are the composite consequence of multiple factors related to scene structure, illumination, and imaging. Multilinear algebra, the algebra of higher-order tensors, offers a potent mathematical framework for analyzing the multifactor structure of image ensembles and for addressing the difficult problem of disentangling the constituent factors or modes. Our multilinear modeling technique employs a tensor extension of the conventional matrix singular value decomposition (SVD), known as the N-mode SVD. As a concrete example, we consider the multilinear analysis of ensembles of facial images that combine several modes, including different facial geometries (people), expressions, head poses, and lighting conditions. Our resulting “TensorFaces” representation has several advantages over conventional eigenfaces. More generally, multilinear analysis shows promise as a unifying framework for a variety of computer vision problems.

M. Alex O. Vasilescu, Demetri Terzopoulos
‘Dynamism of a Dog on a Leash’ or Behavior Classification by Eigen-Decomposition of Periodic Motions

Following Futurism, we show how periodic motions can be represented by a small number of eigen-shapes that capture the whole dynamic mechanism of periodic motions. Spectral decomposition of a silhouette of an object in motion serves as a basis for behavior classification by principle component analysis. The boundary contour of the walking dog, for example, is first computed efficiently and accurately. After normalization, the implicit representation of a sequence of silhouette contours given by their corresponding binary images, is used for generating eigen-shapes for the given motion. Singular value decomposition produces these eigen-shapes that are then used to analyze the sequence. We show examples of object as well as behavior classification based on the eigen-decomposition of the binary silhouette sequence.

Roman Goldenberg, Ron Kimmel, Ehud Rivlin, Michael Rudzsky
Automatic Detection and Tracking of Human Motion with a View-Based Representation

This paper proposes a solution for the automatic detection and tracking of human motion in image sequences. Due to the complexity of the human body and its motion, automatic detection of 3D human motion remains an open, and important, problem. Existing approaches for automatic detection and tracking focus on 2D cues and typically exploit object appearance (color distribution, shape) or knowledge of a static background. In contrast, we exploit 2D optical flow information which provides rich descriptive cues, while being independent of object and background appearance. To represent the optical flow patterns of people from arbitrary viewpoints, we develop a novel representation of human motion using low-dimensional spatio-temporal models that are learned using motion capture data of human subjects. In addition to human motion (the foreground) we probabilistically model the motion of generic scenes (the background); these statistical models are defined as Gibbsian fields specified from the first-order derivatives of motion observations. Detection and tracking are posed in a principled Bayesian framework which involves the computation of a posterior probability distribution over the model parameters (i.e., the location and the type of the human motion) given a sequence of optical flow observations. Particle filtering is used to represent and predict this non-Gaussian posterior distribution over time. The model parameters of samples from this distribution are related to the pose parameters of a 3D articulated model (e.g. the approximate joint angles and movement direction). Thus the approach proves suitable for initializing more complex probabilistic models of human motion. As shown by experiments on real image sequences, our method is able to detect and track people under different viewpoints with complex backgrounds.

Ronan Fablet, Michael J. Black
Using Robust Estimation Algorithms for Tracking Explicit Curves

The context of this work is lateral vehicle control using a camera as a sensor. A natural tool for controlling a vehicle is recursive filtering. The well-known Kalman filtering theory relies on Gaussian assumptions on both the state and measure random variables. However, image processing algorithms yield measurements that, most of the time, are far from Gaussian, as experimentally shown on real data in our application. It is therefore necessary to make the approach more robust, leading to the so-called robust Kalman filtering. In this paper, we review this approach from a very global point of view, adopting a constrained least squares approach, which is very similar to the half-quadratic theory, and justifies the use of iterative reweighted least squares algorithms. A key issue in robust Kalman filtering is the choice of the prediction error covariance matrix. Unlike in the Gaussian case, its computation is not straightforward in the robust case, due to the nonlinearity of the involved expectation. We review the classical alternatives and propose new ones. A theoretical study of these approximations is out of the scope of this paper, however we do provide an experimental comparison on synthetic data perturbed with Cauchy-distributed noise.

Jean-Philippe Tarel, Sio-Song Ieng, Pierre Charbonnier
On the Motion and Appearance of Specularities in Image Sequences

Real scenes are full of specularities (highlights and reflections), and yet most vision algorithms ignore them. In order to capture the appearance of realistic scenes, we need to model specularities as separate layers. In this paper, we study the behavior of specularities in static scenes as the camera moves, and describe their dependence on varying surface geometry, orientation, and scene point and camera locations. For a rectilinear camera motion with constant velocity, we study how the specular motion deviates from a straight trajectory (disparity deviation) and how much it violates the epipolar constraint (epipolar deviation). Surprisingly, for surfaces that are convex or not highly undulating, these deviations are usually quite small. We also study the appearance of specularities, i.e., how they interact with the body reflection, and with the usual occlusion ordering constraints applicable to diffuse opaque layers. We present a taxonomy of specularities based on their photometric properties as a guide for designing separation techniques. Finally, we propose a technique to extract specularities as a separate layer, and demonstrate it using an image sequence of a complex scene.

Rahul Swaminathan, Sing Bing Kang, Richard Szeliski, Antonio Criminisi, Shree K. Nayar
Multiple Hypothesis Tracking for Automatic Optical Motion Capture

We present a technique for performing the tracking stage of optical motion capture which retains, at each time frame, multiple marker association hypotheses and estimates of the subject’s position. Central to this technique are the equations for calculating the likelihood of a sequence of association hypotheses, which we develop using a Bayesian approach. The system is able to perform motion capture using fewer cameras and a lower frame rate than has been used previously, and does not require the assistance of a human operator. We conclude by demonstrating the tracker on real data and provide an example in which our technique is able to correctly determine all marker associations and standard tracking techniques fail.

Maurice Ringer, Joan Lasenby
Single Axis Geometry by Fitting Conics

In this paper, we describe a new approach for recovering 3D geometry from an uncalibrated image sequence of a single axis (turn-table) motion. Unlike previous methods, the computation of multiple views encoded by the fundamental matrix or trifocal tensor is not required. Instead, the new approach is based on fitting a conic locus to corresponding image points over multiple views. It is then shown that the geometry of single axis motion can be recovered given at least two such conics. In the case of two conics the reconstruction may have a two fold ambiguity, but this ambiguity is removed if three conics are used.The approach enables the geometry of the single axis motion (the 3D rotation axis and Euclidean geometry in planes perpendicular to this axis) to be estimated using the minimal number of parameters. It is demonstrated that a Maximum Likelihood Estimation results in measurements that are as good as or superior to those obtained by previous methods, and with a far simpler algorithm. Examples are given on various real sequences, which show the accuracy and robustness of the new algorithm.

Guang Jiang, Hung-tat Tsui, Long Quan, Andrew Zisserman
Computing the Physical Parameters of Rigid-Body Motion from Video

This paper presents an optimization framework for estimating the motion and underlying physical parameters of a rigid body in free flight from video. The algorithm takes a video clip of a tumbling rigid body of known shape and generates a physical simulation of the object observed in the video clip. This solution is found by optimizing the simulation parameters to best match the motion observed in the video sequence. These simulation parameters include initial positions and velocities, environment parameters like gravity direction and parameters of the camera. A global objective function computes the sum squared difference between the silhouette of the object in simulation and the silhouette obtained from video at each frame. Applications include creating interesting rigid body animations, tracking complex rigid body motions in video and estimating camera parameters from video.

Kiran S. Bhat, Steven M. Seitz, Jovan Popović, Pradeep K. Khosla
Building Roadmaps of Local Minima of Visual Models

Getting trapped in suboptimal local minima is a perennial problem in model based vision, especially in applications like monocular human body tracking where complex nonlinear parametric models are repeatedly fitted to ambiguous image data. We show that the trapping problem can be attacked by building ‘roadmaps’ of nearby minima linked by transition pathways — paths leading over low ‘cols’ or ‘passes’ in the cost surface, found by locating the transition state (codimension-1 saddle point) at the top of the pass and then sliding downhill to the next minimum. We know of no previous vision or optimization work on numerical methods for locating transition states, but such methods do exist in computational chemistry, where transitions are critical for predicting reaction parameters. We present two families of methods, originally derived in chemistry, but here generalized, clarified and adapted to the needs of model based vision: eigenvector tracking is a modified form of damped Newton minimization, while hypersurface sweeping sweeps a moving hypersurface through the space, tracking minima within it. Experiments on the challenging problem of estimating 3D human pose from monocular images show that our algorithms find nearby transition states and minima very efficiently, but also underline the disturbingly large number of minima that exist in this and similar model based vision problems.

Cristian Sminchisescu, Bill Triggs
A Generative Method for Textured Motion: Analysis and Synthesis

Natural scenes contain rich stochastic motion patterns which are characterized by the movement of a large number of small elements, such as falling snow, raining, flying birds, firework and waterfall. In this paper, we call these motion patterns textured motion and present a generative method that combines statistical models and algorithms from both texture and motion analysis. The generative method includes the following three aspects. 1). Photometrically, an image is represented as a superposition of linear bases in atomic decomposition using an over-complete dictionary, such as Gabor or Laplacian. Such base representation is known to be generic for natural images, and it is low dimensional as the number of bases is often 100 times smaller than the number of pixels. 2). Geometrically, each moving element (called moveton), such as the individual snowflake and bird, is represented by a deformable template which is a group of several spatially adjacent bases. Such templates are learned through clustering. 3). Dynamically, the movetons are tracked through the image sequence by a stochastic algorithm maximizing a posterior probability. A classic second order Markov chain model is adopted for the motion dynamics. The sources and sinks of the movetons are modeled by birth and death maps. We adopt an EM-like stochastic gradient algorithm for inference of the hidden variables: bases, movetons, birth/death maps, parameters of the dynamics. The learned models are also verified through synthesizing random textured motion sequences which bear similar visual appearance with the observed sequences.

Yizhou Wang, Song-Chun Zhu
Is Super-Resolution with Optical Flow Feasible?

Reconstruction-based super-resolution from motion video has been an active area of study in computer vision and video analysis. Image alignment is a key component of super-resolution algorithms. Almost all previous super-resolution algorithms have assumed that standard methods of image alignment can provide accurate enough alignment for creating super-resolution images. However, a systematic study of the demands on accuracy of multi-image alignment and its effects on super-resolution has been lacking. Furthermore, implicitly or explicitly most algorithms have assumed that the multiple video frames or specific regions of interest are related through global parametric transformations. From previous works, it is not at all clear how super-resolution performs under alignment with piecewise parametric or local optical flow based methods. This paper is an attempt at understanding the influence of image alignment and warping errors on super-resolution. Requirements on the consistency of optical flow across multiple images are studied and it is shown that errors resulting from traditional flow algorithms may render super-resolution infeasible.

WenYi Zhao, Harpreet S. Sawhney
New View Generation with a Bi-centric Camera

We propose a novel method for new view generation from a rectified sequence of images. Our new images correspond to a new camera model, which we call a bi-centric camera; in this model the centers of horizontal and vertical projections lie in different locations on the camera’s optical axis. This model reduces to the regular pinhole camera when the two projection centers coincide, and the pushbroom camera when one projection center lies at infinity. We first analyze the properties of this camera model. We then show how to generate new bi-centric views from vertical cuts in the epipolar volume of a rectified sequence. Every vertical cut generates a new bi-centric view, where the specific parameters of the cut determine the location of the projection centers. We discuss and demonstrate applications, including the generation of images where the virtual camera lies behind occluding surfaces (e.g., behind the back wall of a room), and in unreachable positions (e.g., in front of a glass window). Our final application is the generation of movies taken by a simulated forward moving camera, using as input a movie taken by a sideways moving camera.

Daphna Weinshall, Mi-Suen Lee, Tomas Brodsky, Miroslav Trajkovic, Doron Feldman
Recognizing and Tracking Human Action

Human activity can be described as a sequence of 3D body postures. The traditional approach to recognition and 3D reconstruction of human activity has been to track motion in 3D, mainly using advanced geometric and dynamic models. In this paper we reverse this process. View based activity recognition serves as an input to a human body location tracker with the ultimate goal of 3D reanimation in mind. We demonstrate that specific human actions can be detected from single frame postures in a video sequence. By recognizing the image of a person’s posture as corresponding to a particular key frame from a set of stored key frames, it is possible to map body locations from the key frames to actual frames. This is achieved using a shape matching algorithm based on qualitative similarity that computes point to point correspondence between shapes, together with information about appearance. As the mapping is from fixed key frames, our tracking does not suffer from the problem of having to reinitialise when it gets lost. It is effectively a closed loop. We present experimental results both for recognition and tracking for a sequence of a tennis player.

Josephine Sullivan, Stefan Carlsson
Towards Improved Observation Models for Visual Tracking: Selective Adaptation

An important issue in tracking is how to incorporate an appropriate degree of adaptivity into the observation model. Without any adaptivity, tracking fails when object properties change, for example when illumination changes affect surface colour. Conversely, if an observation model adapts too readily then, during some transient failure of tracking, it is liable to adapt erroneously to some part of the background. The approach proposed here is to adapt selectively, allowing adaptation only during periods when two particular conditions are met: that the object should be both present and in motion. The proposed mechanism for adaptivity is tested here with a foreground colour and motion model. The experimental setting itself is novel in that it uses combined colour and motion observations from a fixed filter bank, with motion used also for initialisation via a Monte Carlo proposal distribution. Adaptation is performed using a stochastic EM algorithm, during periods that meet the conditions above. Tests verify the value of such adaptivity, in that immunity to distraction from clutter of similar colour to the object is considerably enhanced.

Jaco Vermaak, Patrick Pérez, Michel Gangnet, Andrew Blake
Color-Based Probabilistic Tracking

Color-based trackers recently proposed in [3,4,5] have been proved robust and versatile for a modest computational cost. They are especially appealing for tracking tasks where the spatial structure of the tracked objects exhibits such a dramatic variability that trackers based on a space-dependent appearance reference would break down very fast. Trackers in [3,4,5] rely on the deterministic search of a window whose color content matches a reference histogram color model.Relying on the same principle of color histogram distance, but within a probabilistic framework, we introduce a new Monte Carlo tracking technique. The use of a particle filter allows us to better handle color clutter in the background, as well as complete occlusion of the tracked entities over a few frames.This probabilistic approach is very flexible and can be extended in a number of useful ways. In particular, we introduce the following ingredients: multi-part color modeling to capture a rough spatial layout ignored by global histograms, incorporation of a background color model when relevant, and extension to multiple objects.

P. Pérez, C. Hue, J. Vermaak, M. Gangnet
Dense Motion Analysis in Fluid Imagery

Analyzing fluid motion is essential in number of domains and can rarely be handled using generic computer vision techniques. In this particular application context, we address two distinct problems. First we describe a dedicated dense motion estimator. The approach relies on constraints issuing from fluid motion properties and allows us to recover dense motion fields of good quality. Secondly, we address the problem of analyzing such velocity fields. We present a kind of motion-based segmentation relying on an analytic representation of the motion field that permits to extract important quantities such as singularities, stream-functions or velocity potentials. The proposed method has the advantage to be robust, simple, and fast.

T. Corpetti, É. Mémin, P. Pérez
A Layered Motion Representation with Occlusion and Compact Spatial Support

We describe a 2.5D layered representation for visual motion analysis. The representation provides a global interpretation of image motion in terms of several spatially localized foreground regions along with a background region. Each of these regions comprises a parametric shape model and a parametric motion model. The representation also contains depth ordering so visibility and occlusion are rightly included in the estimation of the model parameters. Finally, because the number of objects, their positions, shapes and sizes, and their relative depths are all unknown, initial models are drawn from a proposal distribution, and then compared using a penalized likelihood criterion. This allows us to automatically initialize new models, and to compare different depth orderings.

Allan D. Jepson, David J. Fleet, Michael J. Black
Incremental Singular Value Decomposition of Uncertain Data with Missing Values

We introduce an incremental singular value decomposition (svd) of incomplete data. The svd is developed as data arrives, and can handle arbitrary missing/untrusted values, correlated uncertainty across rows or columns of the measurement matrix, and user priors. Since incomplete data does not uniquely specify an svd, the procedure selects one having minimal rank. For a dense p × q matrix of low rank r, the incremental method has time complexity O(pqr) and space complexity O((p + q)r)—better than highly optimized batch algorithms such as matlab’s svd(). In cases of missing data, it produces factorings of lower rank and residual than batch svd algorithms applied to standard missing-data imputations. We show applications in computer vision and audio feature extraction. In computer vision, we use the incremental svd to develop an efficient and unusually robust subspace-estimating flow-based tracker, and to handle occlusions/missing points in structure-from-motion factorizations.

Matthew Brand
Symmetrical Dense Optical Flow Estimation with Occlusions Detection

Traditional techniques of dense optical flow estimation don’t generally yield symmetrical solutions: the results will differ if they are applied between images I1 and I2 or between images I2 and I1. In this work, we present a method to recover a dense optical flow field map from two images, while explicitely taking into account the symmetry across the images as well as possible occlusions and discontinuities in the flow field. The idea is to consider both displacements vectors from I1 to I2 and I2 to I1 and to minimise an energy functional that explicitely encodes all those properties. This variational problem is then solved using the gradient flow defined by the Euler-Lagrange equations associated to the energy. In order to reduce the risk to be trapped within some irrelevant minimum, a focusing strategy based on a multi-resolution technique is used to converge toward the solution. Promising experimental results on both synthetic and real images are presented to illustrate the capabilities of this symmetrical variational approach to recover accurate optical flow.

Luis Alvarez, Rachid Deriche, Théo Papadopoulo, Javier Sánchez
Audio-Video Sensor Fusion with Probabilistic Graphical Models

We present a new approach to modeling and processing multimedia data. This approach is based on graphical models that combine audio and video variables. We demonstrate it by developing a new algorithm for tracking a moving object in a cluttered, noisy scene using two microphones and a camera. Our model uses unobserved variables to describe the data in terms of the process that generates them. It is therefore able to capture and exploit the statistical structure of the audio and video data separately, as well as their mutual dependencies. Model parameters are learned from data via an EM algorithm, and automatic calibration is performed as part of this procedure. Tracking is done by Bayesian inference of the object location from data. We demonstrate successful performance on multimedia clips captured in real world scenarios using off-the-shelf equipment.

Matthew J. Beal, Hagai Attias, Nebojsa Jojic

Visual Motion

Increasing Space-Time Resolution in Video

We propose a method for constructing a video sequence of high space-time resolution by combining information from multiple low-resolution video sequences of the same dynamic scene. Super-resolution is performed simultaneously in time and in space. By “temporal super-resolution” we mean recovering rapid dynamic events that occur faster than regular frame-rate. Such dynamic events are not visible (or else observed incorrectly) in any of the input sequences, even if these are played in “slow-motion”.The spatial and temporal dimensions are very different in nature, yet are inter-related. This leads to interesting visual tradeoffs in time and space, and to new video applications. These include: (i) treatment of spatial artifacts (e.g., motion-blur) by increasing the temporal resolution, and (ii) combination of input sequences of different space-time resolutions (e.g., NTSC, PAL, and even high quality still images) to generate a high quality video sequence.

Eli Shechtman, Yaron Caspi, Michal Irani
Hyperdynamics Importance Sampling

Sequential random sampling (‘Markov Chain Monte-Carlo’) is a popular strategy for many vision problems involving multimodal distributions over high-dimensional parameter spaces. It applies both to importance sampling (where one wants to sample points according to their ‘importance’ for some calculation, but otherwise fairly) and to global optimization (where one wants to find good minima, or at least good starting points for local minimization, regardless of fairness). Unfortunately, most sequential samplers are very prone to becoming ‘trapped’ for long periods in unrepresentative local minima, which leads to biased or highly variable estimates. We present a general strategy for reducing MCMC trapping that generalizes Voter’s ‘hyperdynamic sampling’ from computational chemistry. The local gradient and curvature of the input distribution are used to construct an adaptive importance sampler that focuses samples on low cost negative curvature regions likely to contain ‘transition states’ — codimension-1 saddle points representing ‘mountain passes’ connecting adjacent cost basins. This substantially accelerates inter-basin transition rates while still preserving correct relative transition probabilities. Experimental tests on the difficult problem of 3D articulated human pose estimation from monocular images show significantly enhanced minimum exploration.

Cristian Sminchisescu, Bill Triggs
Implicit Probabilistic Models of Human Motion for Synthesis and Tracking

This paper addresses the problem of probabilistically modeling 3D human motion for synthesis and tracking. Given the high dimensional nature of human motion, learning an explicit probabilistic model from available training data is currently impractical. Instead we exploit methods from texture synthesis that treat images as representing an implicit empirical distribution. These methods replace the problem of representing the probability of a texture pattern with that of searching the training data for similar instances of that pattern. We extend this idea to temporal data representing 3D human motion with a large database of example motions. To make the method useful in practice, we must address the problem of efficient search in a large training set; efficiency is particularly important for tracking. Towards that end, we learn a low dimensional linear model of human motion that is used to structure the example motion database into a binary tree. An approximate probabilistic tree search method exploits the coefficients of this low-dimensional representation and runs in sub-linear time. This probabilistic tree search returns a particular sample human motion with probability approximating the true distribution of human motions in the database. This sampling method is suitable for use with particle filtering techniques and is applied to articulated 3D tracking of humans within a Bayesian framework. Successful tracking results are presented, along with examples of synthesizing human motion using the model.

Hedvig Sidenbladh, Michael J. Black, Leonid Sigal
Space-Time Tracking

We propose a new tracking technique that is able to capture non-rigid motion by exploiting a space-time rank constraint. Most tracking methods use a prior model in order to deal with challenging local features. The model usually has to be trained on carefully hand-labeled example data before the tracking algorithm can be used. Our new model-free tracking technique can overcome such limitations. This can be achieved in redefining the problem. Instead of first training a model and then tracking the model parameters, we are able to derive trajectory constraints first, and then estimate the model. This reduces the search space significantly and allows for a better feature disambiguation that would not be possible with traditional trackers. We demonstrate that sampling in the trajectory space, instead of in the space of shape configurations, allows us to track challenging footage without use of prior models.

Lorenzo Torresani, Christoph Bregler
Backmatter
Metadaten
Titel
Computer Vision — ECCV 2002
herausgegeben von
Anders Heyden
Gunnar Sparr
Mads Nielsen
Peter Johansen
Copyright-Jahr
2002
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-47969-7
Print ISBN
978-3-540-43745-1
DOI
https://doi.org/10.1007/3-540-47969-4