Skip to main content

2004 | Buch

Computer Vision - ECCV 2004

8th European Conference on Computer Vision, Prague, Czech Republic, May 11-14, 2004. Proceedings, Part IV

herausgegeben von: Tomás Pajdla, Jiří Matas

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

Welcome to the proceedings of the 8th European Conference on Computer - sion! Following a very successful ECCV 2002, the response to our call for papers was almost equally strong – 555 papers were submitted. We accepted 41 papers for oral and 149 papers for poster presentation. Several innovations were introduced into the review process. First, the n- ber of program committee members was increased to reduce their review load. We managed to assign to program committee members no more than 12 papers. Second, we adopted a paper ranking system. Program committee members were asked to rank all the papers assigned to them, even those that were reviewed by additional reviewers. Third, we allowed authors to respond to the reviews consolidated in a discussion involving the area chair and the reviewers. Fourth, thereports,thereviews,andtheresponsesweremadeavailabletotheauthorsas well as to the program committee members. Our aim was to provide the authors with maximal feedback and to let the program committee members know how authors reacted to their reviews and how their reviews were or were not re?ected in the ?nal decision. Finally, we reduced the length of reviewed papers from 15 to 12 pages. ThepreparationofECCV2004wentsmoothlythankstothee?ortsofthe- ganizing committee, the area chairs, the program committee, and the reviewers. We are indebted to Anders Heyden, Mads Nielsen, and Henrik J. Nielsen for passing on ECCV traditions and to Dominique Asselineau from ENST/TSI who kindly provided his GestRFIA conference software. We thank Jan-Olof Eklundh and Andrew Zisserman for encouraging us to organize ECCV 2004 in Prague.

Inhaltsverzeichnis

Frontmatter

Scale Space, Flow, Restoration

A l 1-Unified Variational Framework for Image Restoration

Among image restoration literature, there are mainly two kinds of approach. One is based on a process over image wavelet coefficients, as wavelet shrinkage for denoising. The other one is based on a process over image gradient. In order to get an edge-preserving regularization, one usually assume that the image belongs to the space of functions of Bounded Variation (BV). An energy is minimized, composed of an observation term and the Total Variation (TV) of the image.Recent contributions try to mix both types of method. In this spirit, the goal of this paper is to define a unified-framework including together wavelet methods and energy minimization as TV. In fact, for denoising purpose, it is already shown that wavelet soft-thresholding is equivalent to choose the regularization term as the norm of the Besov space B111. In the present work, this equivalence result is extended to the case of deconvolution problem. We propose a general functional to minimize, which includes the TV minimization, wavelet coefficients regularization, mixed (TV+wavelet) regularization or more general terms. Moreover we give a projection-based algorithm to compute the solution. The convergence of the algorithm is also stated. We show that the decomposition of an image over a dictionary of elementary shapes (atoms) is also included in the proposed framework. So we give a new algorithm to solve this difficult problem, known as Basis Pursuit. We also show numerical results of image deconvolution using TV, wavelets, or TV+wavelets regularization terms.

Julien Bect, Laure Blanc-Féraud, Gilles Aubert, Antonin Chambolle
Support Blob Machines
The Sparsification of Linear Scale Space

A novel generalization of linear scale space is presented. The generalization allows for a sparse approximation of the function at a certain scale.To start with, we first consider the Tikhonov regularization viewpoint on scale space theory [15]. The sparsification is then obtained using ideas from support vector machines [22] and based on the link between sparse approximation and support vector regression as described in [4] and [19].In regularization theory, an ill-posed problem is solved by searching for a solution having a certain differentiability while in some precise sense the final solution is close to the initial signal. To obtain scale space, a quadratic loss function is used to measure the closeness of the initial function to its scale σ image.We propose to alter this loss function thus obtaining our generalization of linear scale space. Comparable to the linear ε-insensitive loss function introduced in support vector regression [22], we use a quadratic ε-insensitive loss function instead of the original quadratic measure. The ε-insensitivity loss allows errors in the approximating function without actual increase in loss. It penalizes errors only when they become larger than the a priory specified constant ε. The quadratic form is mainly maintained for consistency with linear scale space.Although the main concern of the article is the theoretical connection between the foregoing theories, the proposed approach is tested and exemplified in a small experiment on a single image.

Marco Loog
High Accuracy Optical Flow Estimation Based on a Theory for Warping

We study an energy functional for computing optical flow that combines three assumptions: a brightness constancy assumption, a gradient constancy assumption, and a discontinuity-preserving spatio-temporal smoothness constraint. In order to allow for large displacements, linearisations in the two data terms are strictly avoided. We present a consistent numerical scheme based on two nested fixed point iterations. By proving that this scheme implements a coarse-to-fine warping strategy, we give a theoretical foundation for warping which has been used on a mainly experimental basis so far. Our evaluation demonstrates that the novel method gives significantly smaller angular errors than previous techniques for optical flow estimation. We show that it is fairly insensitive to parameter variations, and we demonstrate its excellent robustness under noise.

Thomas Brox, Andrés Bruhn, Nils Papenberg, Joachim Weickert
Model-Based Approach to Tomographic Reconstruction Including Projection Deblurring. Sensitivity of Parameter Model to Noise on Data

Classical techniques for the reconstruction of axisymmetrical objects are all creating artefacts (smooth or unstable solutions). Moreover, the extraction of very precise features related to big density transitions remains quite delicate. In this paper, we develop a new approach -in one dimension for the moment- that allows us both to reconstruct and to extract characteristics: an a priori is provided thanks to a density model. We show the interest of this method in regard to noise effects quantification ; we also explain how to take into account some physical perturbations occuring with real data acquisition.

Jean Michel Lagrange, Isabelle Abraham

2D Shape Detection and Recognition

Unlevel-Sets: Geometry and Prior-Based Segmentation

We present a novel variational approach to top-down image segmentation, which accounts for significant projective transformations between a single prior image and the image to be segmented. The proposed segmentation process is coupled with reliable estimation of the transformation parameters, without using point correspondences. The prior shape is represented by a generalized cone that is based on the contour of the reference object. Its unlevel sections correspond to possible instances of the visible contour under perspective distortion and scaling. We extend the Chan-Vese energy functional by adding a shape term. This term measures the distance between the currently estimated section of the generalized cone and the region bounded by the zero-crossing of the evolving level set function. Promising segmentation results are obtained for images of rotated, translated, corrupted and partly occluded objects. The recovered transformation parameters are compatible with the ground truth.

Tammy Riklin-Raviv, Nahum Kiryati, Nir Sochen
Learning and Bayesian Shape Extraction for Object Recognition

We present a novel algorithm for extracting shapes of contours of (possibly partially occluded) objects from noisy or low-contrast images. The approach taken is Bayesian: we adopt a region-based model that incorporates prior knowledge of specific shapes of interest. To quantify this prior knowledge, we address the problem of learning probability models for collections of observed shapes. Our method is based on the geometric representation and algorithmic analysis of planar shapes introduced and developed in [15]. In contrast with the commonly used approach to active contours using partial differential equation methods [12,20,1], we model the dynamics of contours on vector fields on shape manifolds.

Washington Mio, Anuj Srivastava, Xiuwen Liu
Multiphase Dynamic Labeling for Variational Recognition-Driven Image Segmentation

We propose a variational framework for the integration multiple competing shape priors into level set based segmentation schemes. By optimizing an appropriate cost functional with respect to both a level set function and a (vector-valued) labeling function, we jointly generate a segmentation (by the level set function) and a recognition-driven partition of the image domain (by the labeling function) which indicates where to enforce certain shape priors. Our framework fundamentally extends previous work on shape priors in level set segmentation by directly addressing the central question of where to apply which prior. It allows for the seamless integration of numerous shape priors such that – while segmenting both multiple known and unknown objects – the level set process may selectively use specific shape knowledge for simultaneously enhancing segmentation and recognizing shape.

Daniel Cremers, Nir Sochen, Christoph Schnörr

Posters IV

Integral Invariant Signatures

For shapes represented as closed planar contours, we introduce a class of functionals that are invariant with respect to the Euclidean and similarity group, obtained by performing integral operations. While such integral invariants enjoy some of the desirable properties of their differential cousins, such as locality of computation (which allows matching under occlusions) and uniqueness of representation (in the limit), they are not as sensitive to noise in the data. We exploit the integral invariants to define a unique signature, from which the original shape can be reconstructed uniquely up to the symmetry group, and a notion of scale-space that allows analysis at multiple levels of resolution. The invariant signature can be used as a basis to define various notions of distance between shapes, and we illustrate the potential of the integral invariant representation for shape matching on real and synthetic data.

Siddharth Manay, Byung-Woo Hong, Anthony J. Yezzi, Stefano Soatto
Detecting Keypoints with Stable Position, Orientation, and Scale under Illumination Changes

Local feature approaches to vision geometry and object recognition are based on selecting and matching sparse sets of visually salient image points, known as ‘keypoints’ or ‘points of interest’. Their performance depends critically on the accuracy and reliability with which corresponding keypoints can be found in subsequent images. Among the many existing keypoint selection criteria, the popular Förstner-Harris approach explicitly targets geometric stability, defining keypoints to be points that have locally maximal self-matching precision under translational least squares template matching. However, many applications require stability in orientation and scale as well as in position. Detecting translational keypoints and verifying orientation/scale behaviour post hoc is suboptimal, and can be misleading when different motion variables interact. We give a more principled formulation, based on extending the Förstner-Harris approach to general motion models and robust template matching. We also incorporate a simple local appearance model to ensure good resistance to the most common illumination variations. We illustrate the resulting methods and quantify their performance on test images.

Bill Triggs
Spectral Simplification of Graphs

Although inexact graph-matching is a problem of potentially exponential complexity, the problem may be simplified by decomposing the graphs to be matched into smaller subgraphs. If this is done, then the process may cast into a hierarchical framework and hence rendered suitable for parallel computation. In this paper we describe a spectral method which can be used to partition graphs into non-overlapping subgraphs. In particular, we demonstrate how the Fiedler-vector of the Laplacian matrix can be used to decompose graphs into non-overlapping neighbourhoods that can be used for the purposes of both matching and clustering.

Huaijun Qiu, Edwin R. Hancock
Inferring White Matter Geometry from Diffusion Tensor MRI: Application to Connectivity Mapping

We introduce a novel approach to the cerebral white matter connectivity mapping from diffusion tensor MRI. DT-MRI is the unique non-invasive technique capable of probing and quantifying the anisotropic diffusion of water molecules in biological tissues. We address the problem of consistent neural fibers reconstruction in areas of complex diffusion profiles with potentially multiple fibers orientations. Our method relies on a global modelization of the acquired MRI volume as a Riemannian manifold M and proceeds in 4 majors steps: First, we establish the link between Brownian motion and diffusion MRI by using the Laplace-Beltrami operator on M. We then expose how the sole knowledge of the diffusion properties of water molecules on M is sufficient to infer its geometry. There exists a direct mapping between the diffusion tensor and the metric of M. Next, having access to that metric, we propose a novel level set formulation scheme to approximate the distance function related to a radial Brownian motion on M. Finally, a rigorous numerical scheme using the exponential map is derived to estimate the geodesics of M, seen as the diffusion paths of water molecules. Numerical experimentations conducted on synthetic and real diffusion MRI datasets illustrate the potentialities of this global approach.

Christophe Lenglet, Rachid Deriche, Olivier Faugeras
Unifying Approaches and Removing Unrealistic Assumptions in Shape from Shading: Mathematics Can Help

This article proposes a solution of the Lambertian Shape From Shading (SFS) problem by designing a new mathematical framework based on the notion of viscosity solutions. The power of our approach is twofolds: 1) it defines a notion of weak solutions (in the viscosity sense) which does not necessarily require boundary data. Note that, in the previous SFS work of Rouy et al. [23,15], Falcone et al. [8], Prados et al. [22,20], the characterization of a viscosity solution and its computation require the knowledge of its values on the boundary of the image. This was quite unrealistic because in practice such values are not known. 2) it unifies the work of Rouy et al. [23,15], Falcone et al. [8], Prados et al. [22,20], based on the notion of viscosity solutions and the work of Dupuis and Oliensis [6] dealing with classical (C1) solutions. Also, we generalize their work to the “perspective SFS” problem recently introduced by Prados and Faugeras 20.Moreover this article introduces a “generic” formulation of the SFS problem. This “generic” formulation summarizes various (classical) formulations of the Lambertian SFS problem. In particular it unifies the orthographic and the perspective SFS problems. This “generic” formulation significantly simplifies the formalism of the problem. Thanks to this generic formulation, a single algorithm can be used to compute numerical solutions of all these previous SFS formulations.Finally we propose two algorithms which provide numerical approximations of the new weak solutions of the “generic SFS” problem. These provably convergent algorithms are quite robust and do not necessarily require boundary data.

Emmanuel Prados, Olivier Faugeras
Morphological Operations on Matrix-Valued Images

The output of modern imaging techniques such as diffusion tensor MRI or the physical measurement of anisotropic behaviour in materials such as the stress-tensor consists of tensor-valued data. Hence adequate image processing methods for shape analysis, skeletonisation, denoising and segmentation are in demand. The goal of this paper is to extend the morphological operations of dilation, erosion, opening and closing to the matrix-valued setting. We show that naive approaches such as componentwise application of scalar morphological operations are unsatisfactory, since they violate elementary requirements such as invariance under rotation. This lead us to study an analytic and a geometric alternative which are rotation invariant. Both methods introduce novel non-component-wise definitions of a supremum and an infimum of a finite set of matrices. The resulting morphological operations incorporate information from all matrix channels simultaneously and preserve positive definiteness of the matrix field. Their properties and their performance are illustrated by experiments on diffusion tensor MRI data.

Bernhard Burgeth, Martin Welk, Christian Feddern, Joachim Weickert
Constraints on Coplanar Moving Points

Configurations of dynamic points viewed by one or more cameras have not been studied much. In this paper, we present several view and time-independent constraints on different configurations of points moving on a plane. We show that 4 points with constant independent velocities or accelerations under affine projection can be characterized in a view independent manner using 2 views. Under perspective projection, 5 coplanar points under uniform linear velocity observed for 3 time instants in a single view have a view-independent characterization. The best known constraint for this case involves 6 points observed for 35 frames. Under uniform acceleration, 5 points in 5 time instants have a view-independent characterization. We also present constraints on a point undergoing arbitrary planar motion under affine projections in the Fourier domain. The constraints introduced in this paper involve fewer points or views than similar results reported in the literature and are simpler to compute in most cases. The constraints developed can be applied to many aspects of computer vision. Recognition constraints for several planar point configurations of moving points can result from them. We also show how time-alignment of views captured independently can follow from the constraints on moving point configurations.

Sujit Kuthirummal, C. V. Jawahar, P. J. Narayanan
A PDE Solution of Brownian Warping

A Brownian motion model in the group of diffeomorphisms has been introduced as creating a least committed prior on warps. This prior is source destination symmetric, fulfills a natural semi-group property for warps, and with probability 1 create invertible warps. In this paper, we formulate a Partial Differential Equation for obtaining the maximum likelihood warp given matching constraints derived from the images. We solve for the free boundary conditions, and the bias toward smaller areas in the finite domain setting. Furthermore, we demonstrate the technique on 2D images, and show that the obtained warps are also in practice source-destination symmetric.

Mads Nielsen, P. Johansen
Stereovision-Based Head Tracking Using Color and Ellipse Fitting in a Particle Filter

This paper proposes the use of a particle filter combined with color, depth information, gradient and shape features as an efficient and effective way of dealing with tracking of a head on the basis of image stream coming from a mobile stereovision camera. The head is modeled in the 2D image domain by an ellipse. A weighting function is used to include spatial information in color histogram representing the interior of the ellipse. The lengths of the ellipse’s minor axis are determined on the basis of depth information. The dissimilarity between the current model of the tracked object and target candidates is indicated by a metric based on Bhattacharyya coefficient. Variations of the color representation as a consequence of ellipse’s size change are handled by taking advantage of the scale invariance of the similarity measure. The color histogram and parameters of the ellipse are dynamically updated over time to discriminate in the next iteration between the candidate and actual head representation. This makes possible to track not only a face profile which has been shot during initialization of the tracker but in addition different profiles of the face as well as the head can be tracked. Experimental results which were obtained on long image sequences in a typical office environment show the feasibility of our approach to perform tracking of a head undergoing complex changes of shape and appearance against a varying background. The resulting system runs in real-time on a standard laptop computer installed on a real mobile agent.

Bogdan Kwolek
Parallel Variational Motion Estimation by Domain Decomposition and Cluster Computing

We present an approach to parallel variational optical flow computation on standard hardware by domain decomposition. Using an arbitrary partition of the image plane into rectangular subdomains, the global solution to the variational approach is obtained by iteratively combining local solutions which can be efficiently computed in parallel by separate multi-grid iterations for each subdomain. The approach is particularly suited for implementations on PC-clusters because inter-process communication between subdomains (i.e. processors) is minimized by restricting the exchange of data to a lower-dimensional interface. By applying a dedicated interface preconditioner, the necessary number of iterations between subdomains to achieve a fixed error is bounded independently of the number of subdomains. Our approach provides a major step towards real-time 2D image processing using off-the-shelf PC-hardware and facilitates the efficient application of variational approaches to large-scale image processing problems.

Timo Kohlberger, Christoph Schnörr, Andrés Bruhn, Joachim Weickert
Whitening for Photometric Comparison of Smooth Surfaces under Varying Illumination

We consider the problem of image comparison in order to match smooth surfaces under varying illumination. In a smooth surface nearby surface normals are highly correlated. We model such surfaces as Gaussian processes and derive the resulting statistical characterization of the corresponding images. Supported by this model, we treat the difference between two images, associated with the same surface and different lighting, as colored Gaussian noise, and use the whitening tool from signal detection theory to construct a measure of difference between such images. This also improves comparisons by accentuating the differences between images of different surfaces. At the same time, we prove that no linear filter, including ours, can produce lighting insensitive image comparisons. While our Gaussian assumption is a simplification, the resulting measure functions well for both synthetic and real smooth objects. Thus we improve upon methods for matching images of smooth objects, while providing insight into the performance of such methods. Much prior work has focused on image comparison methods appropriate for highly curved surfaces. We combine our method with one of these, and demonstrate high performance on rough and smooth objects.

Margarita Osadchy, Michael Lindenbaum, David Jacobs
Structure from Motion of Parallel Lines

We investigate the camera geometry of lines parallel in the world. In particular, we formalize the known rotational constraints and add new linear constraints on camera position. The constraints on camera position do not require the cameras to be viewing the same lines, thus providing applications for occluded scenes and calibration of cameras for which fields of view do not intersect. The constraints can also be viewed as constraints of camera geometry with planar patch coordinate systems, and provide a way to investigate texture in a deeper way than has been done to date.

Patrick Baker, Yiannis Aloimonos
A Bayesian Framework for Multi-cue 3D Object Tracking

This paper presents a Bayesian framework for multi-cue 3D object tracking of deformable objects. The proposed spatio-temporal object representation involves a set of distinct linear subspace models or Dynamic Point Distribution Models (DPDMs), which can deal with both continuous and discontinuous appearance changes; the representation is learned fully automatically from training data. The representation is enriched with texture information by means of intensity histograms, which are compared using the Bhattacharyya coefficient. Direct 3D measurement is furthermore provided by a stereo system.State propagation is achieved by a particle filter which combines the three cues shape, texture and depth, in its observation density function. The tracking framework integrates an independently operating object detection system by means of importance sampling. We illustrate the benefit of our integrated multi-cue tracking approach on pedestrian tracking from a moving vehicle.

J. Giebel, D. M. Gavrila, C. Schnörr
On the Significance of Real-World Conditions for Material Classification

Classifying materials from their appearance is a challenging problem, especially if illumination and pose conditions are permitted to change: highlights and shadows caused by 3D structure can radically alter a sample’s visual texture. Despite these difficulties, researchers have demonstrated impressive results on the CUReT database which contains many images of 61 materials under different conditions. A first contribution of this paper is to further advance the state-of-the-art by applying Support Vector Machines to this problem. To our knowledge, we record the best results to date on the CUReT database.In our work we additionally investigate the effect of scale since robustness to viewing distance and zoom settings is crucial in many real-world situations. Indeed, a material’s appearance can vary considerably as fine-level detail becomes visible or disappears as the camera moves towards or away from the subject. We handle scale-variations using a pure-learning approach, incorporating samples imaged at different distances into the training set. An empirical investigation is conducted to show how the classification accuracy decreases as less scale information is made available during training. Since the CUReT database contains little scale variation, we introduce a new database which images ten CUReT materials at different distances, while also maintaining some change in pose and illumination. The first aim of the database is thus to provide scale variations, but a second and equally important objective is to attempt to recognise different samples of the CUReT materials. For instance, does training on the CUReT database enable recognition of another piece of sandpaper? The results clearly demonstrate that it is not possible to do so with any acceptable degree of accuracy. Thus we conclude that impressive results even on a well-designed database such as CUReT, does not imply that material classification is close to being a solved problem under real-world conditions.

Eric Hayman, Barbara Caputo, Mario Fritz, Jan-Olof Eklundh
Toward Accurate Segmentation of the LV Myocardium and Chamber for Volumes Estimation in Gated SPECT Sequences

The left ventricle myocardium and chamber segmentation in gated SPECT images is a challenging problem. Segmentation is however the first step to geometry reconstruction and quantitative measurements needed for clinical parameters extraction from the images. New algorithms for segmenting the heart left ventricle myocardium and chamber are proposed. The accuracy of the volumes measured from the geometrical models used for segmentation is evaluated using simulated images. The error on the computed ejection fraction is low enough for diagnosis assistance. Experiments on real images are shown.

Diane Lingrand, Arnaud Charnoz, Pierre Malick Koulibaly, Jacques Darcourt, Johan Montagnat
An MCMC-Based Particle Filter for Tracking Multiple Interacting Targets

We describe a Markov chain Monte Carlo based particle filter that effectively deals with interacting targets, i.e., targets that are influenced by the proximity and/or behavior of other targets. Such interactions cause problems for traditional approaches to the data association problem. In response, we developed a joint tracker that includes a more sophisticated motion model to maintain the identity of targets throughout an interaction, drastically reducing tracker failures. The paper presents two main contributions: (1) we show how a Markov random field (MRF) motion prior, built on the fly at each time step, can substantially improve tracking when targets interact, and (2) we show how this can be done efficiently using Markov chain Monte Carlo (MCMC) sampling. We prove that incorporating an MRF to model interactions is equivalent to adding an additional interaction factor to the importance weights in a joint particle filter. Since a joint particle filter suffers from exponential complexity in the number of tracked targets, we replace the traditional importance sampling step in the particle filter with an MCMC sampling step. The resulting filter deals efficiently and effectively with complicated interactions when targets approach each other. We present both qualitative and quantitative results to substantiate the claims made in the paper, including a large scale experiment on a video-sequence of over 10,000 frames in length.

Zia Khan, Tucker Balch, Frank Dellaert
Human Pose Estimation Using Learnt Probabilistic Region Similarities and Partial Configurations

A model of human appearance is presented for efficient pose estimation from real-world images. In common with related approaches, a high-level model defines a space of configurations which can be associated with image measurements and thus scored. A search is performed to identify good configuration(s). Such an approach is challenging because the configuration space is high dimensional, the search is global, and the appearance of humans in images is complex due to background clutter, shape uncertainty and texture.The system presented here is novel in several respects. The formulation allows differing numbers of parts to be parameterised and allows poses of differing dimensionality to be compared in a principled manner based upon learnt likelihood ratios. In contrast with current approaches, this allows a part based search in the presence of self occlusion. Furthermore, it provides a principled automatic approach to other object occlusion. View based probabilistic models of body part shapes are learnt that represent intra and inter person variability (in contrast to rigid geometric primitives). The probabilistic region for each part is transformed into the image using the configuration hypothesis and used to collect two appearance distributions for the part’s foreground and adjacent background. Likelihood ratios for single parts are learnt from the dissimilarity of the foreground and adjacent background appearance distributions. It is important to note the distinction between this technique and restrictive foreground/background specific modelling. It is demonstrated that this likelihood allows better discrimination of body parts in real world images than contour to edge matching techniques. Furthermore, the likelihood is less sparse and noisy, making coarse sampling and local search more effective. A likelihood ratio for body part pairs with similar appearances is also learnt. Together with a model of inter-part distances this better describes correct higher dimensional configurations. Results from applying an optimization scheme to the likelihood model for challenging real world images are presented.

Timothy J. Roberts, Stephen J. McKenna, Ian W. Ricketts
Tensor Field Segmentation Using Region Based Active Contour Model

Tensor fields (matrix valued data sets) have recently attracted increased attention in the fields of image processing, computer vision, visualization and medical imaging. Tensor field segmentation is an important problem in tensor field analysis and has not been addressed adequately in the past. In this paper, we present an effective region-based active contour model for tensor field segmentation and show its application to diffusion tensor magnetic resonance images (MRI) as well as for the texture segmentation problem in computer vision. Specifically, we present a variational principle for an active contour using the Euclidean difference of tensors as a discriminant. The variational formulation is valid for piecewise smooth regions, however, for the sake of simplicity of exposition, we present the piecewise constant region model in detail. This variational principle is a generalization of the region-based active contour to matrix valued functions. It naturally leads to a curve evolution equation for tensor field segmentation, which is subsequently expressed in a level set framework and solved numerically. Synthetic and real data experiments involving the segmentation of diffusion tensor MRI as well as structure tensors obtained from real texture data are shown to depict the performance of the proposed model.

Zhizhou Wang, Baba C. Vemuri
Groupwise Diffeomorphic Non-rigid Registration for Automatic Model Building

We describe a framework for registering a group of images together using a set of non-linear diffeomorphic warps. The result of the groupwise registration is an implicit definition of dense correspondences between all of the images in a set, which can be used to construct statistical models of shape change across the set, avoiding the need for manual annotation of training images. We give examples on two datasets (brains and faces) and show the resulting models of shape and appearance variation. We show results of experiments demonstrating that the groupwise approach gives a more reliable correspondence than pairwise matching alone.

T. F. Cootes, S. Marsland, C. J. Twining, K. Smith, C. J. Taylor
Separating Transparent Layers through Layer Information Exchange

In this paper we present an approach for separating two transparent layers in images and video sequences. Given two initial unknown physical mixtures, I1 and I2, of real scene layers, L1 and L2, we seek a layer separation which minimizes the structural correlations across the two layers, at every image point. Such a separation is achieved by transferring local grayscale structure from one image to the other wherever it is highly correlated with the underlying local grayscale structure in the other image, and vice versa. This bi-directional transfer operation, which we call the “layer information exchange”, is performed on diminishing window sizes, from global image windows (i.e., the entire image), down to local image windows, thus detecting similar grayscale structures at varying scales across pixels. We show the applicability of this approach to various real-world scenarios, including image and video transparency separation. In particular, we show that this approach can be used for separating transparent layers in images obtained under different polarizations, as well as for separating complex non-rigid transparent motions in video sequences. These can be done without prior knowledge of the layer mixing model (simple additive, alpha-mated composition with an unknown alpha-map, or other), and under unknown complex temporal changes (e.g., unknown varying lighting conditions).

Bernard Sarel, Michal Irani
Multiple Classifier System Approach to Model Pruning in Object Recognition

We propose a multiple classifier system approach to object recognition in computer vision. The aim of the approach is to use multiple experts successively to prune the list of candidate hypotheses that have to be considered for object interpretation. The experts are organised in a serial architecture, with the later stages of the system dealing with a monotonically decreasing number of models. We develop a theoretical model which underpins this approach to object recognition and show how it relates to various heuristic design strategies advocated in the literature. The merits of the advocated approach are then demonstrated experimentally using the SOIL database. We show how the overall performance of a two stage object recognition system, designed using the proposed methodology, improves. The improvement is achieved in spite of using a weak recogniser for the first (pruning) stage. The effects of different pruning strategies are demonstrated.

Josef Kittler, Ali R. Ahmadyfard
Coaxial Omnidirectional Stereopsis

Catadioptric omnidirectional sensors, consisting of a camera and a mirror, can track objects even when their bearings change suddenly, usually due to the observer making a significant turn. There has been much debate concerning the relative merits of several possible shapes of mirrors to be used by such sensors.This paper suggests that the conical mirror has some advantages over other shapes of mirrors. In particular, the projection beam from the central region of the image is reflected and distributed towards the horizon rather than back at the camera. Therefore a significant portion of the image resolution is not wasted.A perspective projection unwarping of the conical mirror images is developed and demonstrated. This has hitherto been considered possible only with mirrors that possess single viewpoint geometry. The cone is viewed by a camera placed some distance away from the tip. Such arrangement does not have single viewpoint geometry. However, its multiple viewpoints are shown to be dimensionally separable.Once stereopsis has been solved, it is possible to project the points of interest to a new image through a (virtual) single viewpoint. Successful reconstruction of a single viewpoint image from a pair of images obtained via multiple viewpoints appears to validate the use of multiple viewpoint projections.The omnidirectional stereo uses two catadioptric sensors. Each sensor consists of one conical mirror and one perspective camera. The sensors are in a coaxial arrangement along the vertical axis, facing up or down. This stereoscopic arrangement leads to very simple matching since the epipolar lines are the radial lines of identical orientations in both omnidirectional images.The stereopsis results on artificially generated scenes with known ground truth show that the error in computed distance is proportional to the distance of the object (as usual), plus the distance of the camera from the mirror. The error is also inversely proportional to the image radius coordinate, ie. the results are more accurate for points imaged nearer the rim of the circular mirror.

Libor Spacek
Classifying Materials from Their Reflectance Properties

We explore the possibility of recognizing the surface material from a single image with unknown illumination, given the shape of the surface. Model-based PCA is used to create a low-dimensional basis to represent the images. Variations in the illumination create manifolds in the space spanned by this basis. These manifolds are learnt using captured illumination maps and the CUReT database. Classification of the material is done by finding the manifold closest to the point representing the image of the material. Testing on synthetic data shows that the problem is hard. The materials form groups where the materials in a group often are mis-classifed as one of the other materials in the group. With a grouping algorithm we find a grouping of the materials in the CUReT database. Tests on images of real materials in natural illumination settings show promising results.

Peter Nillius, Jan-Olof Eklundh
Seamless Image Stitching in the Gradient Domain

Image stitching is used to combine several individual images having some overlap into a composite image. The quality of image stitching is measured by the similarity of the stitched image to each of the input images, and by the visibility of the seam between the stitched images.In order to define and get the best possible stitching, we introduce several formal cost functions for the evaluation of the quality of stitching. In these cost functions, the similarity to the input images and the visibility of the seam are defined in the gradient domain, minimizing the disturbing edges along the seam. A good image stitching will optimize these cost functions, overcoming both photometric inconsistencies and geometric misalignments between the stitched images.This approach is demonstrated in the generation of panoramic images and in object blending. Comparisons with existing methods show the benefits of optimizing the measures in the gradient domain.

Anat Levin, Assaf Zomet, Shmuel Peleg, Yair Weiss
Spectral Clustering for Robust Motion Segmentation

In this paper, we propose a robust motion segmentation method using the techniques of matrix factorization and subspace separation. We first show that the shape interaction matrix can be derived using QR decomposition rather than Singular Value Decomposition(SVD) which also leads to a simple proof of the shape subspace separation theorem. Using the shape interaction matrix, we solve the motion segmentation problems by the spectral clustering techniques. We exploit multi-way Min-Max cut clustering method and provide a novel approach for cluster membership assignment. We further show that we can combine a cluster refinement method based on subspace separation with the graph clustering method to improve its robustness in the presence of noise. The proposed method yields very good performance for both synthetic and real image sequences.

JinHyeong Park, Hongyuan Zha, Rangachar Kasturi
Learning Outdoor Color Classification from Just One Training Image

We present an algorithm for color classification with explicit illuminant estimation and compensation. A Gaussian classifier is trained with color samples from just one training image. Then, using a simple diagonal illumination model, the illuminants in a new scene that contains some of the same surface classes are estimated in a Maximum Likelihood framework using the Expectation Maximization algorithm. We also show how to impose priors on the illuminants, effectively computing a Maximum-A-Posteriori estimation. Experimental results show the excellent performances of our classification algorithm for outdoor images.

Roberto Manduchi
A Polynomial-Time Metric for Attributed Trees

We address the problem of comparing attributed trees and propose a novel distance measure centered around the notion of a maximal similarity common subtree. The proposed measure is general and defined on trees endowed with either symbolic or continuous-valued attributes, and can be equally applied to ordered and unordered, rooted and unrooted trees. We prove that our measure satisfies the metric constraints and provide a polynomial-time algorithm to compute it. This is a remarkable and attractive property since the computation of traditional edit-distance-based metrics is NP-complete, except for ordered structures. We experimentally validate the usefulness of our metric on shape matching tasks, and compare it with edit-distance measures.

Andrea Torsello, Džena Hidović, Marcello Pelillo
Probabilistic Multi-view Correspondence in a Distributed Setting with No Central Server

We present a probabilistic algorithm for finding correspondences across multiple images. The algorithm runs in a distributed setting, where each camera is attached to a separate computing unit, and the cameras communicate over a network. No central computer is involved in the computation. The algorithm runs with low computational and communication cost. Our distributed algorithm assumes access to a standard pairwise wide-baseline stereo matching algorithm ($\mathcal{WBS}$) and our goal is to minimize the number of images transmitted over the network, as well as the number of times the $\mathcal{WBS}$ is computed. We employ the theory of random graphs to provide an efficient probabilistic algorithm that performs $\mathcal{WBS}$ on a small number of image pairs, followed by a correspondence propagation phase. The heart of the paper is a theoretical analysis of the number of times $\mathcal{WBS}$ must be performed to ensure that an overwhelming portion of the correspondence information is extracted. The analysis is extended to show how to combat computer and communication failures, which are expected to occur in such settings, as well as correspondence misses. This analysis yields an efficient distributed algorithm, but it can also be used to improve the performance of centralized algorithms for correspondence.

Shai Avidan, Yael Moses, Yoram Moses
Monocular 3D Reconstruction of Human Motion in Long Action Sequences

A novel algorithm is presented for the 3D reconstruction of human action in long (>30 second) monocular image sequences. A sequence is represented by a small set of automatically found representative keyframes. The skeletal joint positions are manually located in each keyframe and mapped to all other frames in the sequence. For each keyframe a 3D key pose is created, and interpolation between these 3D body poses, together with the incorporation of limb length and symmetry constraints, provides a smooth initial approximation of the 3D motion. This is then fitted to the image data to generate a realistic 3D reconstruction. The degree of manual input required is controlled by the diversity of the sequence’s content. Sports’ footage is ideally suited to this approach as it frequently contains a limited number of repeated actions. Our method is demonstrated on a long (36 second) sequence of a woman playing tennis filmed with a non-stationary camera. This sequence required manual initialisation on <1.5% of the frames, and demonstrates that the system can deal with very rapid motion, severe self-occlusions, motion blur and clutter occurring over several concurrent frames. The monocular 3D reconstruction is verified by synthesising a view from the perspective of a ‘ground truth’ reference camera, and the result is seen to provide a qualitatively accurate 3D reconstruction of the motion.

Gareth Loy, Martin Eriksson, Josephine Sullivan, Stefan Carlsson
Fusion of Infrared and Visible Images for Face Recognition

A number of studies have demonstrated that infrared (IR) imagery offers a promising alternative to visible imagery due to it’s insensitive to variations in face appearance caused by illumination changes. IR, however, has other limitations including that it is opaque to glass. The emphasis in this study is on examining the sensitivity of IR imagery to facial occlusion caused by eyeglasses. Our experiments indicate that IR-based recognition performance degrades seriously when eyeglasses are present in the probe image but not in the gallery image and vice versa. To address this serious limitation of IR, we propose fusing the two modalities, exploiting the fact that visible-based recognition is less sensitive to the presence or absence of eyeglasses. Our fusion scheme is pixel-based, operates in the wavelet domain, and employs genetic algorithms (GAs) to decide how to combine IR with visible information. Although our fusion approach was not able to fully discount illumination effects present in the visible images, our experimental results show substantial improvements recognition performance overall, and it deserves further consideration.

Aglika Gyaourova, George Bebis, Ioannis Pavlidis
Reliable Fiducial Detection in Natural Scenes

Reliable detection of fiducial targets in real-world images is addressed in this paper. We show that even the best existing schemes are fragile when exposed to other than laboratory imaging conditions, and introduce an approach which delivers significant improvements in reliability at moderate computational cost. The key to these improvements is in the use of machine learning techniques, which have recently shown impressive results for the general object detection problem, for example in face detection. Although fiducial detection is an apparently simple special case, this paper shows why robustness to lighting, scale and foreshortening can be addressed within the machine learning framework with greater reliability than previous, more ad-hoc, fiducial detection schemes.

David Claus, Andrew W. Fitzgibbon
Light Field Appearance Manifolds

Statistical shape and texture appearance models are powerful image representations, but previously had been restricted to 2D or 3D shapes with smooth surfaces and lambertian reflectance. In this paper we present a novel 3D appearance model using image-based rendering techniques, which can represent complex lighting conditions, structures, and surfaces. We construct a light field manifold capturing the multi-view appearance of an object class and extend the direct search algorithm of Cootes and Taylor to match new light fields or 2D images of an object to a point on this manifold. When matching to a 2D image the reconstructed light field can be used to render unseen views of the object. Our technique differs from previous view-based active appearance models in that model coefficients between views are explicitly linked, and that we do not model any pose variation within the shape model at a single view. It overcomes the limitations of polygonal based appearance models and uses light fields that are acquired in real-time.

Chris Mario Christoudias, Louis-Philippe Morency, Trevor Darrell
Galilean Differential Geometry of Moving Images

In this paper we develop a systematic theory about local structure of moving images in terms of Galilean differential invariants. We argue that Galilean invariants are useful for studying moving images as they disregard constant motion that typically depends on the motion of the observer or the observed object, and only describe relative motion that might capture surface shape and motion boundaries. The set of Galilean invariants for moving images also contains the Euclidean invariants for (still) images. Complete sets of Galilean invariants are derived for two main cases: when the spatio-temporal gradient cuts the image plane and when it is tangent to the image plane. The former case correspond to isophote curve motion and the later to creation and disappearance of image structure, a case that is not well captured by the theory of optical flow. The derived invariants are shown to be describable in terms of acceleration, divergence, rotation and deformation of image structure. The described theory is completely based on bottom up computation from local spatio-temporal image information.

Daniel Fagerström
Tracking People with a Sparse Network of Bearing Sensors

Recent techniques for multi-camera tracking have relied on either overlap between the fields of view of the cameras or on a visible ground plane. We show that if information about the dynamics of the target is available, we can estimate the trajectory of the target without visible ground planes or overlapping cameras.

A. Rahimi, B. Dunagan, T. Darrell
Transformation-Invariant Embedding for Image Analysis

Dimensionality reduction is an essential aspect of visual processing. Traditionally, linear dimensionality reduction techniques such as principle components analysis have been used to find low dimensional linear subspaces in visual data. However, sub-manifolds in natural data are rarely linear, and consequently many recent techniques have been developed for discovering non-linear manifolds. Prominent among these are Local Linear Embedding and Isomap. Unfortunately, such techniques currently use a naive appearance model that judges image similarity based solely on Euclidean distance. In visual data, Euclidean distances rarely correspond to a meaningful perceptual difference between nearby images. In this paper, we attempt to improve the quality of manifold inference techniques for visual data by modeling local neighborhoods in terms of natural transformations between images—for example, by allowing image operations that extend simple differences and linear combinations. We introduce the idea of modeling local tangent spaces of the manifold in terms of these richer transformations. Given a local tangent space representation, we then embed data in a lower dimensional coordinate system while preserving reconstruction weights. This leads to improved manifold discovery in natural image sets.

Ali Ghodsi, Jiayuan Huang, Dale Schuurmans
The Least-Squares Error for Structure from Infinitesimal Motion

We analyze the least–squares error for structure from motion (SFM) with a single infinitesimal motion (“structure from optical flow”). We present approximations to the noiseless error over two, complementary regions of motion estimates: roughly forward and non–forward translations. Experiments show that these capture the error’s detailed behavior over the entire motion range. They can be used to derive new error properties, including generalizations of the bas–relief ambiguity. As examples, we explain the error’s complexity for epipoles near the field of view; for planar scenes, we derive a new, double bas–relief ambiguity and prove the absence of local minima. For nonplanar scenes, our approximations simplify under reasonable assumptions. We show that our analysis applies even for large noise, and that the projective error has less information for estimating motion than the calibrated error. Our results make possible a comprehensive error analysis of SFM.

John Oliensis
Stereo Based 3D Tracking and Scene Learning, Employing Particle Filtering within EM

We present a generative probabilistic model for 3D scenes with stereo views. With this model, we track an object in 3 dimensions while simultaneously learning its appearance and the appearance of the background. By using a generative model for the scene, we are able to aggregate evidence over time. In addition, the probabilistic model naturally handles sources of variability.For inference and learning in the model, we formulate an Expectation Maximization (EM) algorithm where Rao-Blackwellized Particle filtering is used in the E step. The use of stereo views of the scene is a strong source of disambiguating evidence and allows rapid convergence of the algorithm. The update equations have an appealing form and as a side result, we give a generative probabilistic interpretation for the Sum of Squared Differences (SSD) metric known from the field of Stereo Vision.

Trausti Kristjansson, Hagai Attias, John Hershey

3D Shape Representation and Reconstruction

The Isophotic Metric and Its Application to Feature Sensitive Morphology on Surfaces

We introduce the isophotic metric, a new metric on surfaces, in which the length of a surface curve is not just dependent on the curve itself, but also on the variation of the surface normals along it. A weak variation of the normals brings the isophotic length of a curve close to its Euclidean length, whereas a strong normal variation increases the isophotic length. We actually have a whole family of metrics, with a parameter that controls the amount by which the normals influence the metric. We are interested here in surfaces with features such as smoothed edges, which are characterized by a significant deviation of the two principal curvatures. The isophotic metric is sensitive to those features: paths along features are close to geodesics in the isophotic metric, paths across features have high isophotic length. This shape effect makes the isophotic metric useful for a number of applications. We address feature sensitive image processing with mathematical morphology on surfaces, feature sensitive geometric design on surfaces, and feature sensitive local neighborhood definition and region growing as an aid in the segmentation process for reverse engineering of geometric objects.

Helmut Pottmann, Tibor Steiner, Michael Hofer, Christoph Haider, Allan Hanbury
A Closed-Form Solution to Non-rigid Shape and Motion Recovery

Recovery of three dimensional (3D) shape and motion of non-static scenes from a monocular video sequence is important for applications like robot navigation and human computer interaction. If every point in the scene randomly moves, it is impossible to recover the non-rigid shapes. In practice, many non-rigid objects, e.g. the human face under various expressions, deform with certain structures. Their shapes can be regarded as a weighted combination of certain shape bases. Shape and motion recovery under such situations has attracted much interest. Previous work on this problem [6,4,13] utilized only orthonormality constraints on the camera rotations (rotation constraints). This paper proves that using only the rotation constraints results in ambiguous and invalid solutions. The ambiguity arises from the fact that the shape bases are not unique because their linear transformation is a new set of eligible bases. To eliminate the ambiguity, we propose a set of novel constraints, basis constraints, which uniquely determine the shape bases. We prove that, under the weak-perspective projection model, enforcing both the basis and the rotation constraints leads to a closed-form solution to the problem of non-rigid shape and motion recovery. The accuracy and robustness of our closed-form solution is evaluated quantitatively on synthetic data and qualitatively on real video sequences.

Jing Xiao, Jin-xiang Chai, Takeo Kanade
Stereo Using Monocular Cues within the Tensor Voting Framework

We address the fundamental problem of matching two static images. Significant progress has been made in this area, but the correspondence problem has not been solved. Most of the remaining difficulties are caused by occlusion and lack of texture. We propose an approach that addresses these difficulties within a perceptual organization framework, taking into account both binocular and monocular sources of information. Geometric and color information from the scene is used for grouping, complementing each other’s strengths. We begin by generating matching hypotheses for every pixel in such a way that a variety of matching techniques can be integrated, thus allowing us to combine their particular advantages. Correct matches are detected based on the support they receive from their neighboring candidate matches in 3-D, after tensor voting. They are grouped into smooth surfaces, the projections of which on the images serve as the reliable set of matches. The use of segmentation based on geometric cues to infer the color distributions of scene surfaces is arguably the most significant contribution of our research. The inferred reliable set of matches guides the generation of disparity hypotheses for the unmatched pixels. The match for an unmatched pixel is selected among a set of candidates as the one that is a good continuation of the surface, and also compatible with the observed color distribution of the surface in both images. Thus, information is propagated from more to less reliable pixels considering both geometric and color information. We present results on standard stereo pairs.

Philippos Mordohai, Gérard Medioni
Shape and View Independent Reflectance Map from Multiple Views

We consider the problem of estimating the 3D shape and reflectance properties of an object made of a single material from a calibrated set of multiple views. To model reflectance, we propose a View Independent Reflectance Map (VIRM) and derive it from Torrance-Sparrow BRDF model. Reflectance estimation then amounts to estimating VIRM parameters. We represent object shape using surface triangulation. We pose the estimation problem as one of minimizing cost of matching input images, and the images synthesized using shape and reflectance estimates. We show that by enforcing a constant value of VIRM as a global constraint, we can minimize the matching cost function by iterating between VIRM and shape estimation. Experiment results on both synthetic and real objects show that our algorithm is effective in recovering the 3D shape as well as non-lambertian reflectance information. Our algorithm does not require that light sources be known or calibrated using special objects, thus making it more flexible than other photometric stereo or shape from shading methods. The estimated VIRM can be used to synthesize views of other objects.

Tianli Yu, Ning Xu, Narendra Ahuja
Backmatter
Metadaten
Titel
Computer Vision - ECCV 2004
herausgegeben von
Tomás Pajdla
Jiří Matas
Copyright-Jahr
2004
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-24673-2
Print ISBN
978-3-540-21981-1
DOI
https://doi.org/10.1007/b97873