Skip to main content

2004 | Buch

Computer Vision - ECCV 2004

8th European Conference on Computer Vision, Prague, Czech Republic, May 11-14, 2004. Proceedings, Part III

herausgegeben von: Tomáš Pajdla, Jiří Matas

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

Welcome to the proceedings of the 8th European Conference on Computer - sion! Following a very successful ECCV 2002, the response to our call for papers was almost equally strong – 555 papers were submitted. We accepted 41 papers for oral and 149 papers for poster presentation. Several innovations were introduced into the review process. First, the n- ber of program committee members was increased to reduce their review load. We managed to assign to program committee members no more than 12 papers. Second, we adopted a paper ranking system. Program committee members were asked to rank all the papers assigned to them, even those that were reviewed by additional reviewers. Third, we allowed authors to respond to the reviews consolidated in a discussion involving the area chair and the reviewers. Fourth, thereports,thereviews,andtheresponsesweremadeavailabletotheauthorsas well as to the program committee members. Our aim was to provide the authors with maximal feedback and to let the program committee members know how authors reacted to their reviews and how their reviews were or were not re?ected in the ?nal decision. Finally, we reduced the length of reviewed papers from 15 to 12 pages. ThepreparationofECCV2004wentsmoothlythankstothee?ortsofthe- ganizing committee, the area chairs, the program committee, and the reviewers. We are indebted to Anders Heyden, Mads Nielsen, and Henrik J. Nielsen for passing on ECCV traditions and to Dominique Asselineau from ENST/TSI who kindly provided his GestRFIA conference software. We thank Jan-Olof Eklundh and Andrew Zisserman for encouraging us to organize ECCV 2004 in Prague.

Inhaltsverzeichnis

Frontmatter

Learning and Recognition

A Constrained Semi-supervised Learning Approach to Data Association

Data association (obtaining correspondences) is a ubiquitous problem in computer vision. It appears when matching image features across multiple images, matching image features to object recognition models and matching image features to semantic concepts. In this paper, we show how a wide class of data association tasks arising in computer vision can be interpreted as a constrained semi-supervised learning problem. This interpretation opens up room for the development of new, more efficient data association methods. In particular, it leads to the formulation of a new principled probabilistic model for constrained semi-supervised learning that accounts for uncertainty in the parameters and missing data. By adopting an ingenious data augmentation strategy, it becomes possible to develop an efficient MCMC algorithm where the high-dimensional variables in the model can be sampled efficiently and directly from their posterior distributions. We demonstrate the new model and algorithm on synthetic data and the complex problem of matching image features to words in the image captions.

Hendrik Kück, Peter Carbonetto, Nando de Freitas
Learning Mixtures of Weighted Tree-Unions by Minimizing Description Length

This paper focuses on how to perform the unsupervised clustering of tree structures in an information theoretic setting. We pose the problem of clustering as that of locating a series of archetypes that can be used to represent the variations in tree structure present in the training sample. The archetypes are tree-unions that are formed by merging sets of sample trees, and are attributed with probabilities that measure the node frequency or weight in the training sample. The approach is designed to operate when the correspondences between nodes are unknown and must be inferred as part of the learning process. We show how the tree merging process can be posed as the minimisation of an information theoretic minimum descriptor length criterion. We illustrate the utility of the resulting algorithm on the problem of classifying 2D shapes using a shock graph representation.

Andrea Torsello, Edwin R. Hancock
Decision Theoretic Modeling of Human Facial Displays

We present a vision based, adaptive, decision theoretic model of human facial displays in interactions. The model is a partially observable Markov decision process, or POMDP. A POMDP is a stochastic planner used by an agent to relate its actions and utility function to its observations and to other context. Video observations are integrated into the POMDP using a dynamic Bayesian network that creates spatial and temporal abstractions of the input sequences. The parameters of the model are learned from training data using an a-posteriori constrained optimization technique based on the expectation-maximization algorithm. The training does not require facial display labels on the training data. The learning process discovers clusters of facial display sequences and their relationship to the context automatically. This avoids the need for human intervention in training data collection, and allows the models to be used without modification for facial display learning in any context without prior knowledge of the type of behaviors to be used. We present an experimental paradigm in which we record two humans playing a game, and learn the POMDP model of their behaviours. The learned model correctly predicts human actions during a simple cooperative card game based, in part, on their facial displays.

Jesse Hoey, James J. Little
Kernel Feature Selection with Side Data Using a Spectral Approach

We address the problem of selecting a subset of the most relevant features from a set of sample data in cases where there are multiple (equally reasonable) solutions. In particular, this topic includes on one hand the introduction of hand-crafted kernels which emphasize certain desirable aspects of the data and, on the other hand, the suppression of one of the solutions given “side” data, i.e., when one is given information about undesired aspects of the data. Such situations often arise when there are several, even conflicting, dimensions to the data. For example, documents can be clustered based on topic, authorship or writing style; images of human faces can be clustered based on illumination conditions, facial expressions or by person identity, and so forth.Starting from a spectral method for feature selection, known as Q − α, we introduce first a kernel version of the approach thereby adding the power of non-linearity to the underlying representations and the choice to emphasize certain kernel-dependent aspects of the data. As an alternative to the use of a kernel we introduce a principled manner for making use of auxiliary data within a spectral approach for handling situations where multiple subsets of relevant features exist in the data. The algorithm we will introduce allows for inhibition of relevant features of the auxiliary dataset and allows for creating a topological model of all relevant feature subsets in the dataset.To evaluate the effectiveness of our approach we have conducted experiments both on real-images of human faces under varying illumination, facial expressions and person identity and on general machine learning tasks taken from the UC Irvine repository. The performance of our algorithm for selecting features with side information is generally superior to current methods we tested (PCA,OPCA,CPCA and SDR-SI).

Amnon Shashua, Lior Wolf

Tracking II

Tracking Articulated Motion Using a Mixture of Autoregressive Models

We present a novel approach to modelling the non-linear and time-varying dynamics of human motion, using statistical methods to capture the characteristic motion patterns that exist in typical human activities. Our method is based on automatically clustering the body pose space into connected regions exhibiting similar dynamical characteristics, modelling the dynamics in each region as a Gaussian autoregressive process. Activities that would require large numbers of exemplars in example based methods are covered by comparatively few motion models. Different regions correspond roughly to different action-fragments and our class inference scheme allows for smooth transitions between these, thus making it useful for activity recognition tasks. The method is used to track activities including walking, running, etc., using a planar 2D body model. Its effectiveness is demonstrated by its success in tracking complicated motions like turns, without any key frames or 3D information.

Ankur Agarwal, Bill Triggs
Novel Skeletal Representation for Articulated Creatures

Volumetric structures are frequently used as shape descriptors for 3D data. The capture of such data is being facilitated by developments in multi-view video and range scanning, extending to subjects that are alive and moving. In this paper, we examine vision-based modeling and the related representation of moving articulated creatures using spines. We define a spine as a branching axial structure representing the shape and topology of a 3D object’s limbs, and capturing the limbs’ correspondence and motion over time.Our spine concept builds on skeletal representations often used to describe the internal structure of an articulated object and the significant protrusions. The algorithms for determining both 2D and 3D skeletons generally use an objective function tuned to balance stability against the responsiveness to detail. Our representation of a spine provides for enhancements over a 3D skeleton, afforded by temporal robustness and correspondence. We also introduce a probabilistic framework that is needed to compute the spine from a sequence of surface data.We present a practical implementation that approximates the spine’s joint probability function to reconstruct spines for synthetic and real subjects that move.

Gabriel J. Brostow, Irfan Essa, Drew Steedly, Vivek Kwatra
An Accuracy Certified Augmented Reality System for Therapy Guidance

Our purpose is to provide an augmented reality system for Radio-Frequency guidance that could superimpose a 3D model of the liver, its vessels and tumors (reconstructed from CT images) on external video images of the patient. In this paper, we point out that clinical usability not only need the best affordable registration accuracy, but also a certification that the required accuracy is met, since clinical conditions change from one intervention to the other. Beginning by addressing accuracy performances, we show that a 3D/2D registration based on radio-opaque fiducials is more adapted to our application constraints than other methods. Then, we outline a lack in their statistical assumptions which leads us to the derivation of a new extended 3D/2D criterion. Careful validation experiments on real data show that an accuracy of 2 mm can be achieved in clinically relevant conditions, and that our new criterion is up to 9% more accurate, while keeping a computation time compatible with real-time at 20 to 40 Hz.After the fulfillment of our statistical hypotheses, we turn to safety issues. Propagating the data noise through both our criterion and the classical one, we obtain an explicit formulation of the registration error. As the real conditions do not always fit the theory, it is critical to validate our prediction with real data. Thus, we perform a rigorous incremental validation of each assumption using successively: synthetic data, real video images of a precisely known object, and finally real CT and video images of a soft phantom. Results point out that our error prediction is fully valid in our application range. Eventually, we provide an accurate Augmented Reality guidance system that allows the automatic detection of potentially inaccurate guidance.

Stéphane Nicolau, Xavier Pennec, Luc Soler, Nicholas Ayache

Posters III

3D Human Body Tracking Using Deterministic Temporal Motion Models

There has been much effort invested in increasing the robustness of human body tracking by incorporating motion models. Most approaches are probabilistic in nature and seek to avoid becoming trapped into local minima by considering multiple hypotheses, which typically requires exponentially large amounts of computation as the number of degrees of freedom increases.By contrast, in this paper, we use temporal motion models based on Principal Component Analysis to formulate the tracking problem as one of minimizing differentiable objective functions. The differential structure of these functions is rich enough to yield good convergence properties using a deterministic optimization scheme at a much reduced computational cost. Furthermore, by using a multi-activity database, we can partially overcome one of the major limitations of approaches that rely on motion models, namely the fact they are limited to one single type of motion.We will demonstrate the effectiveness of the proposed approach by using it to fit full-body models to stereo data of people walking and running and whose quality is too low to yield satisfactory results without motion models.

Raquel Urtasun, Pascal Fua
Robust Fitting by Adaptive-Scale Residual Consensus

Computer vision tasks often require the robust fit of a model to some data. In a robust fit, two major steps should be taken: i) robustly estimate the parameters of a model, and ii) differentiate inliers from outliers. We propose a new estimator called Adaptive-Scale Residual Consensus (ASRC). ASRC scores a model based on both the residuals of inliers and the corresponding scale estimate determined by those inliers. ASRC is very robust to multiple-structural data containing a high percentage of outliers. Compared with RANSAC, ASRC requires no pre-determined inlier threshold as it can simultaneously estimate the parameters of a model and the scale of inliers belonging to that model. Experiments show that ASRC has better robustness to heavily corrupted data than other robust methods. Our experiments address two important computer vision tasks: range image segmentation and fundamental matrix calculation. However, the range of potential applications is much broader than these.

Hanzi Wang, David Suter
Causal Camera Motion Estimation by Condensation and Robust Statistics Distance Measures

The problem of Simultaneous Localization And Mapping (SLAM) originally arose from the robotics community and is closely related to the problems of camera motion estimation and structure recovery in computer vision. Recent work in the vision community addressed the SLAM problem using either active stereo or a single passive camera. The precision of camera based SLAM was tested in indoor static environments. However the extended Kalman filters (EKF) as used in these tests are highly sensitive to outliers. For example, even a single mismatch of some feature point could lead to catastrophic collapse in both motion and structure estimates. In this paper we employ a robust-statistics-based condensation approach to the camera motion estimation problem. The condensation framework maintains multiple motion hypotheses when ambiguities exist. Employing robust distance functions in the condensation measurement stage enables the algorithm to discard a considerable fraction of outliers in the data. The experimental results demonstrate the accuracy and robustness of the proposed method.

Tal Nir, Alfred M. Bruckstein
An Adaptive Window Approach for Image Smoothing and Structures Preserving

A novel adaptive smoothing approach is proposed for image smoothing and discontinuities preservation. The method is based on a locally piecewise constant modeling of the image with an adaptive choice of a window around each pixel. The adaptive smoothing technique associates with each pixel the weighted sum of data points within the window. We describe a statistical method for choosing the optimal window size, in a manner that varies at each pixel, with an adaptive choice of weights for every pair of pixels in the window. We further investigate how the I-divergence could be used to stop the algorithm. It is worth noting the proposed technique is data-driven and fully adaptive. Simulation results show that our algorithm yields promising smoothing results on a variety of real images.

Charles Kervrann
Extraction of Semantic Dynamic Content from Videos with Probabilistic Motion Models

The exploitation of video data requires to extract information at a rather semantic level, and then, methods able to infer “concepts” from low-level video features. We adopt a statistical approach and we focus on motion information. Because of the diversity of dynamic video content (even for a given type of events), we have to design appropriate motion models and learn them from videos. We have defined original and parsimonious probabilistic motion models, both for the dominant image motion (camera motion) and the residual image motion (scene motion). These models are learnt off-line. Motion measurements include affine motion models to capture the camera motion, and local motion features for scene motion. The two-step event detection scheme consists in pre-selecting the video segments of potential interest, and then in recognizing the specified events among the pre-selected segments, the recognition being stated as a classification problem. We report accurate results on several sports videos.

Gwenaëlle Piriou, Patrick Bouthemy, Jian-Feng Yao
Are Iterations and Curvature Useful for Tensor Voting?

Tensor voting is an efficient algorithm for perceptual grouping and feature extraction, particularly for contour extraction. In this paper two studies on tensor voting are presented. First the use of iterations is investigated, and second, a new method for integrating curvature information is evaluated. In opposition to other grouping methods, tensor voting claims the advantage to be non-iterative. Although non-iterative tensor voting methods provide good results in many cases, the algorithm can be iterated to deal with more complex data configurations. The experiments conducted demonstrate that iterations substantially improve the process of feature extraction and help to overcome limitations of the original algorithm. As a further contribution we propose a curvature improvement for tensor voting. On the contrary to the curvature-augmented tensor voting proposed by Tang and Medioni, our method takes advantage of the curvature calculation already performed by the classical tensor voting and evaluates the full curvature, sign and amplitude. Some new curvature-modified voting fields are also proposed. Results show a lower degree of artifacts, smoother curves, a high tolerance to scale parameter changes and also more noise-robustness.

Sylvain Fischer, Pierre Bayerl, Heiko Neumann, Gabriel Cristóbal, Rafael Redondo
A Feature-Based Approach for Determining Dense Long Range Correspondences

Planar motion models can provide gross motion estimation and good segmentation for image pairs with large inter-frame disparity. However, as the disparity becomes larger, the resulting dense correspondences will become increasingly inaccurate for everything but purely planar objects. Flexible motion models, on the other hand, tend to overfit and thus make partitioning difficult. For this reason, to achieve dense optical flow for image sequences with large inter-frame disparity, we propose a two stage process in which a planar model is used to get an approximation for the segmentation and the gross motion, and then a spline is used to refine the fit. We present experimental results for dense optical flow estimation on image pairs with large inter-frame disparity that are beyond the scope of existing approaches.

Josh Wills, Serge Belongie
Combining Geometric- and View-Based Approaches for Articulated Pose Estimation

In this paper we propose an efficient real-time approach that combines vision-based tracking and a view-based model to estimate the pose of a person. We introduce an appearance model that contains views of a person under various articulated poses. The appearance model is built and updated online. The main contribution consists of modeling, in each frame, the pose changes as a linear transformation of the view change. This linear model allows (i) for predicting the pose in a new image, and (ii) for obtaining a better estimate of the pose corresponding to a key frame. Articulated pose is computed by merging the estimation provided by the tracking-based algorithm and the linear prediction given by the view-based model.

David Demirdjian
Shape Matching and Recognition – Using Generative Models and Informative Features

We present an algorithm for shape matching and recognition based on a generative model for how one shape can be generated by the other. This generative model allows for a class of transformations, such as affine and non-rigid transformations, and induces a similarity measure between shapes. The matching process is formulated in the EM algorithm. To have a fast algorithm and avoid local minima, we show how the EM algorithm can be approximated by using informative features, which have two key properties–invariant and representative. They are also similar to the proposal probabilities used in DDMCMC [13]. The formulation allows us to know when and why approximations can be made and justifies the use of bottom-up features, which are used in a wide range of vision problems. This integrates generative models and feature-based approaches within the EM framework and helps clarifying the relationships between different algorithms for this problem such as shape contexts [3] and softassign [5]. We test the algorithm on a variety of data sets including MPEG7 CE-Shape-1, Kimia silhouettes, and real images of street scenes. We demonstrate very effective performance and compare our results with existing algorithms. Finally, we briefly illustrate how our approach can be generalized to a wider range of problems including object detection.

Zhuowen Tu, Alan L. Yuille
Generalized Histogram: Empirical Optimization of Low Dimensional Features for Image Matching

We propose Generalized Histogram as low-dimensional representation of an image for efficient and precise image matching. Multiplicity detection of videos in broadcast video archives is getting important for many video-based applications including commercial film identification, unsupervised video parsing and structuring, and robust highlight shot detection. This inherently requires efficient and precise image matching among extremely huge number of images. Histogram-based image similarity search and matching is known to be effective, and its enhancement techniques such as adaptive binning, subregion histogram, and adaptive weighting have been studied. We show that these techniques can be represented as linear conversion of high-dimensional primitive histograms and can be integrated into generalized histograms. A linear learning method to obtain generalized histograms from sample sets is presented with a sample expansion technique to circumvent the overfitting problem due to high-dimensionality and insufficient sample size. The generalized histogram takes advantage of these techniques, and achieves more than 90% precision and recall with 16-D generalized histogram compared to the ground truth computed by normalized cross correlation. The practical importance of the work is revealed by successful matching performance with 20,000 frame images obtained from actual broadcast videos.

Shin’ichi Satoh
Recognizing Objects in Range Data Using Regional Point Descriptors

Recognition of three dimensional (3D) objects in noisy and cluttered scenes is a challenging problem in 3D computer vision. One approach that has been successful in past research is the regional shape descriptor. In this paper, we introduce two new regional shape descriptors: 3D shape contexts and harmonic shape contexts. We evaluate the performance of these descriptors on the task of recognizing vehicles in range scans of scenes using a database of 56 cars. We compare the two novel descriptors to an existing descriptor, the spin image, showing that the shape context based descriptors have a higher recognition rate on noisy scenes and that 3D shape contexts outperform the others on cluttered scenes.

Andrea Frome, Daniel Huber, Ravi Kolluri, Thomas Bülow, Jitendra Malik
Shape Reconstruction from 3D and 2D Data Using PDE-Based Deformable Surfaces

In this paper, we propose a new PDE-based methodology for deformable surfaces that is capable of automatically evolving its shape to capture the geometric boundary of the data and simultaneously discover its underlying topological structure. Our model can handle multiple types of data (such as volumetric data, 3D point clouds and 2D image data), using a common mathematical framework. The deformation behavior of the model is governed by partial differential equations (e.g. the weighted minimal surface flow). Unlike the level-set approach, our model always has an explicit representation of geometry and topology. The regularity of the model and the stability of the numerical integration process are ensured by a powerful Laplacian tangential smoothing operator. By allowing local adaptive refinement of the mesh, the model can accurately represent sharp features. We have applied our model for shape reconstruction from volumetric data, unorganized 3D point clouds and multiple view images. The versatility and robustness of our model allow its application to the challenging problem of multiple view reconstruction. Our approach is unique in its combination of simultaneous use of a high number of arbitrary camera views with an explicit mesh that is intuitive and easy-to-interact-with. Our model-based approach automatically selects the best views for reconstruction, allows for visibility checking and progressive refinement of the model as more images become available. The results of our extensive experiments on synthetic and real data demonstrate robustness, high reconstruction accuracy and visual quality.

Ye Duan, Liu Yang, Hong Qin, Dimitris Samaras
Structure and Motion Problems for Multiple Rigidly Moving Cameras

Vision (both using one-dimensional and two-dimensional retina) is useful for the autonomous navigation of vehicles. In this paper the case of a vehicle equipped with multiple cameras with non-overlapping views is considered. The geometry and algebra of such a moving platform of cameras are considered. In particular we formulate and solve structure and motion problems for a few novel cases of such moving platforms. For the case of two-dimensional retina cameras (ordinary cameras) there are two minimal cases of three points in two platform positions and two points in three platform positions. For the case of one-dimensional retina cameras there are three minimal structure and motion problems. In this paper we consider one of these (6 points in 3 platform positions). The theory has been tested on synthetic data.

Henrik Stewenius, Kalle Åström
Detection and Tracking Scheme for Line Scratch Removal in an Image Sequence

A detection and tracking approach is proposed for line scratch removal in a digital film restoration process. Unlike random impulsive distortions such as dirt spots, line scratch artifacts persist across several frames. Hence, motion compensated methods will fail, as well as single-frame methods if scratches are unsteady or fragmented.The proposed method uses as input projections of each image of the input sequence. First, a 1D-extrema detector provides candidates. Next, a MHT (Multiple Hypothesis Tracker) uses these candidates to create and keep multiple hypothesis. As the tracking goes further through the sequence, each hypothesis gains or looses evidence. To avoid a combinatorial explosion, the hypothesis tree is sequentially pruned, preserving a list of the best ones. An energy function (quality of the candidates, comparison to a model) is used for the path hypothesis sorting. As hypothesis are set up at each iteration, even if no information is available, a tracked path might cross gaps (missed detection or speckled scratches).At last, the tracking stage feeds the correction process. Since this contribution focus on the detection stage, only tracking results are given.

Bernard Besserer, Cedric Thiré
Color Constancy Using Local Color Shifts

The human visual system is able to correctly determine the color of objects in view irrespective of the illuminant. This ability to compute color constant descriptors is known as color constancy. We have developed a parallel algorithm for color constancy. This algorithm is based on the computation of local space average color using a grid of processing elements. We have one processing element per image pixel. Each processing element has access to the data stored in neighboring elements. Local space average color is used to shift the color of the input pixel in the direction of the gray vector. The computations are executed inside the unit color cube. The color of the input pixel as well as local space average color is simply a vector inside this Euclidean space. We compute the component of local space average color which is orthogonal to the gray vector. This component is subtracted from the color of the input pixel to compute a color corrected image. Before performing the color correction step we can also normalize both colors. In this case, the resulting color is rescaled to the original intensity of the input color such that the image brightness remains unchanged.

Marc Ebner
Image Anisotropic Diffusion Based on Gradient Vector Flow Fields

In this paper, the gradient vector flow fields are introduced in the image anisotropic diffusion, and the shock filter, mean curvature flow and Perona-Malik equation are reformulated respectively in the context of this flow fields. Many advantages over the original models can be obtained, such as numerical stability, a large capture range, and computational simplification etc. In addition, the fairing process is introduced in the anisotropic diffusion, which contains the fourth order derivative and is reformulated as the intrinsic Laplacian of curvature under the level set framework. By this fairing process, the boundaries of shape will become more outstanding. In order to overcome numerical errors, the intrinsic Laplacian of curvature is computed from the gradient vector flow fields, but not directly from the observed images.

Hongchuan Yu, Chin-Seng Chua
Optimal Importance Sampling for Tracking in Image Sequences: Application to Point Tracking

In this paper, we propose a particle filtering approach for tracking applications in image sequences. The system we propose combines a measurement equation and a dynamic equation which both depend on the image sequence. Taking into account several possible observations, the likelihood is modeled as a linear combination of Gaussian laws. Such a model allows inferring an analytic expression of the optimal importance function used in the diffusion process of the particle filter. It also enables building a relevant approximation of a validation gate. We demonstrate the significance of this model for a point tracking application.

Elise Arnaud, Etienne Mémin
Learning to Segment

We describe a new approach for learning to perform class-based segmentation using only unsegmented training examples. As in previous methods, we first use training images to extract fragments that contain common object parts. We then show how these parts can be segmented into their figure and ground regions in an automatic learning process. This is in contrast with previous approaches, which required complete manual segmentation of the objects in the training examples. The figure-ground learning combines top-down and bottom-up processes and proceeds in two stages, an initial approximation followed by iterative refinement. The initial approximation produces figure-ground labeling of individual image fragments using the unsegmented training images. It is based on the fact that on average, points inside the object are covered by more fragments than points outside it. The initial labeling is then improved by an iterative refinement process, which converges in up to three steps. At each step, the figure-ground labeling of individual fragments produces a segmentation of complete objects in the training images, which in turn induce a refined figure-ground labeling of the individual fragments. In this manner, we obtain a scheme that starts from unsegmented training images, learns the figure-ground labeling of image fragments, and then uses this labeling to segment novel images. Our experiments demonstrate that the learned segmentation achieves the same level of accuracy as methods using manual segmentation of training images, producing an automatic and robust top-down segmentation.

Eran Borenstein, Shimon Ullman
MCMC-Based Multiview Reconstruction of Piecewise Smooth Subdivision Curves with a Variable Number of Control Points

We investigate the automated reconstruction of piecewise smooth 3D curves, using subdivision curves as a simple but flexible curve representation. This representation allows tagging corners to model non-smooth features along otherwise smooth curves. We present a reversible jump Markov chain Monte Carlo approach which obtains an approximate posterior distribution over the number of control points and tags. In a Rao-Blackwellization scheme, we integrate out the control point locations, reducing the variance of the resulting sampler. We apply this general methodology to the reconstruction of piecewise smooth curves from multiple calibrated views, in which the object is segmented from the background using a Markov random field approach. Results are shown for multiple images of two pot shards as would be encountered in archaeological applications.

Michael Kaess, Rafal Zboinski, Frank Dellaert
Bayesian Correction of Image Intensity with Spatial Consideration

Under dimly lit condition, it is difficult to take a satisfactory image in long exposure time with a hand-held camera. Despite the use of a tripod, moving objects in the scene still generate ghosting and blurring effect. In this paper, we propose a novel approach to recover a high-quality image by exploiting the tradeoff between exposure time and motion blur, which considers color statistics and spatial constraints simultaneously, by using only two defective input images. A Bayesian framework is adopted to incorporate the factors to generate an optimal color mapping function. No estimation of PSF is performed. Our new approach can be readily extended to handle high contrast scenes to reveal fine details in saturated or highlight regions. An image acquisition system deploying off-the-shelf digital cameras and camera control softwares was built. We present our results on a variety of defective images: global and local motion blur due to camera shake or object movement, and saturation due to high contrast scenes.

Jiaya Jia, Jian Sun, Chi-Keung Tang, Heung-Yeung Shum
Stretching Bayesian Learning in the Relevance Feedback of Image Retrieval

This paper is about the work on user relevance feedback in image retrieval. We take this problem as a standard two-class pattern classification problem aiming at refining the retrieval precision by learning through the user relevance feedback data. However, we have investigated the problem by noting two important unique characteristics of the problem: small sample collection and asymmetric sample distributions between positive and negative samples. We have developed a novel approach to stretching Bayesian learning to solve for this problem by explicitly exploiting the two unique characteristics, which is the methodology of BAyesian Learning in Asymmetric and Small sample collections, thus called BALAS. Different learning strategies are used for positive and negative sample collections in BALAS, respectively, based on the two unique characteristics. By defining the relevancy confidence as the relevant posterior probability, we have developed an integrated ranking scheme in BALAS which complementarily combines the subjective relevancy confidence and the objective feature-based distance measure to capture the overall retrieval semantics. The experimental evaluations have confirmed the rationale of the proposed ranking scheme, and have also demonstrated that BALAS is superior to an existing relevance feedback method in the current literature in capturing the overall retrieval semantics.

Ruofei Zhang, Zhongfei (Mark) Zhang
Real-Time Tracking of Multiple Skin-Colored Objects with a Possibly Moving Camera

This paper presents a method for tracking multiple skin-colored objects in images acquired by a possibly moving camera. The proposed method encompasses a collection of techniques that enable the modeling and detection of skin-colored objects as well as their temporal association in image sequences. Skin-colored objects are detected with a Bayesian classifier which is bootstrapped with a small set of training data. Then, an off-line iterative training procedure is employed to refine the classifier using additional training images. On-line adaptation of skin-color probabilities is used to enable the classifier to cope with illumination changes. Tracking over time is realized through a novel technique which can handle multiple skin-colored objects. Such objects may move in complex trajectories and occlude each other in the field of view of a possibly moving camera. Moreover, the number of tracked objects may vary in time. A prototype implementation of the developed system operates on 320x240 live video in real time (28Hz) on a conventional Pentium 4 processor. Representative experimental results from the application of this prototype to image sequences are also provided.

Antonis A. Argyros, Manolis I. A. Lourakis
Evaluation of Image Fusion Performance with Visible Differences

Multisensor signal-level image fusion has attracted considerable research attention recently. Whereas it is relatively straightforward to obtain a fused image, e.g. a simple but crude method is to average the input signals, assessing the performance of fusion algorithms is much harder in practice. This is particularly true in widespread “fusion for display” applications where multisensor images are fused and the resulting image is presented to a human operator. As recent studies have shown, the most direct and reliable image fusion evaluation method, subjective tests with a representative sample of potential users are expensive in terms of both time/effort and equipment required. This paper presents an investigation into the application of the Visible signal Differences Prediction modelling, to the objective evaluation of the performance of fusion algorithms. Thus given a pair of input images and a resulting fused image, the Visual Difference Prediction process evaluates the probability that a signal difference between each of the inputs and the fused image can be detected by the human visual system. The resulting probability maps are used to form objective fusion performance metrics and are also integrated with more complex fusion performance measures. Experimental results indicate that the inclusion of visible differences information in fusion assessment yields metrics whose accuracy, with reference to subjective results, is superior to that obtained from the state of the art objective fusion performance measures.

Vladimir Petrović, Costas Xydeas
An Information-Based Measure for Grouping Quality

We propose a method for measuring the quality of a grouping result, based on the following observation: a better grouping result provides more information about the true, unknown grouping. The amount of information is evaluated using an automatic procedure, relying on the given hypothesized grouping, which generates (homogeneity) queries about the true grouping and answers them using an oracle. The process terminates once the queries suffice to specify the true grouping. The number of queries is a measure of the hypothesis non-informativeness. A relation between the query count and the (probabilistically characterized) uncertainty of the true grouping, is established and experimentally supported. The proposed information-based quality measure is free from arbitrary choices, uniformly treats different types of grouping errors, and does not favor any algorithm. We also found that it approximates human judgment better than other methods and gives better results when used to optimize a segmentation algorithm.

Erik A. Engbers, Michael Lindenbaum, Arnold W. M. Smeulders
Bias in Shape Estimation

This paper analyses the uncertainty in the estimation of shape from motion and stereo. It is shown that there are computational limitations of a statistical nature that previously have not been recognized. Because there is noise in all the input parameters, we cannot avoid bias. The analysis rests on a new constraint which relates image lines and rotation to shape. Because the human visual system has to cope with bias as well, it makes errors. This explains the underestimation of slant found in computational and psychophysical experiments, and demonstrated here for an illusory display. We discuss properties of the best known estimators with regard to the problem, as well as possible avenues for visual systems to deal with the bias.

Hui Ji, Cornelia Fermüller
Contrast Marginalised Gradient Template Matching

This paper addresses a key problem in the detection of shapes via template matching: the variation of accumulator-space response with object-background contrast. By formulating a probabilistic model for planar shape location within an image or video frame, a vector-field filtering operation may be derived which, in the limiting case of vanishing noise, leads to the Hough-transform filters reported by Kerbyson & Atherton [5]. By further incorporating a model for contrast uncertainty, a contrast invariant accumulator space is constructed, in which local maxima provide an indication of the most probable locations of a sought planar shape. Comparisons with correlation matching, and Hough transforms employing gradient magnitude, binary and vector templates are presented. A key result is that a posterior density function for locating a shape marginalised for contrast uncertainty is obtained by summing the functions of the outputs of a series of spatially invariant filters, thus providing a route to fast parallel implementations.

Saleh Basalamah, Anil Bharath, Donald McRobbie
The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

The recognition accuracy of current discriminant architectures for visual recognition is hampered by the dependence on holistic image representations, where images are represented as vectors in a high-dimensional space. Such representations lead to complex classification problems due to the need to 1) restrict image resolution and 2) model complex manifolds due to variations in pose, lighting, and other imaging variables. Localized representations, where images are represented as bags of low-dimensional vectors, are significantly less affected by these problems but have traditionally been difficult to combine with discriminant classifiers such as the support vector machine (SVM). This limitation has recently been lifted by the introduction of probabilistic SVM kernels, such as the Kullback-Leibler (KL) kernel. In this work we investigate the advantages of using this kernel as a means to combine discriminant recognition with localized representations. We derive a taxonomy of kernels based on the combination of the KL-kernel with various probabilistic representation previously proposed in the recognition literature. Experimental evaluation shows that these kernels can significantly outperform traditional SVM solutions for recognition.

Nuno Vasconcelos, Purdy Ho, Pedro Moreno
Partial Object Matching with Shapeme Histograms

Histogram of shape signature or prototypical shapes, called shapemes, have been used effectively in previous work for 2D/3D shape matching & recognition. We extend the idea of shapeme histogram to recognize partially observed query objects from a database of complete model objects. We propose to represent each model object as a collection of shapeme histograms, and match the query histogram to this representation in two steps: (i) compute a constrained projection of the query histogram onto the subspace spanned by all the shapeme histograms of the model, and (ii) compute a match measure between the query histogram and the projection. The first step is formulated as a constrained optimization problem that is solved by a sampling algorithm. The second step is formulated under a Bayesian framework where an implicit feature selection process is conducted to improve the discrimination capability of shapeme histograms. Results of matching partially viewed range objects with a 243 model database demonstrate better performance than the original shapeme histogram matching algorithm and other approaches.

Y. Shan, H. S. Sawhney, B. Matei, R. Kumar
Modeling and Synthesis of Facial Motion Driven by Speech

We introduce a novel approach to modeling the dynamics of human facial motion induced by the action of speech for the purpose of synthesis. We represent the trajectories of a number of salient features on the human face as the output of a dynamical system made up of two subsystems, one driven by the deterministic speech input, and a second driven by an unknown stochastic input. Inference of the model (learning) is performed automatically and involves an extension of independent component analysis to time-depentend data. Using a shape-texture decompositional representation for the face, we generate facial image sequences reconstructed from synthesized feature point positions.

Payam Saisan, Alessandro Bissacco, Alessandro Chiuso, Stefano Soatto
Recovering Local Shape of a Mirror Surface from Reflection of a Regular Grid

We present a new technique to recover the shape of an unknown smooth specular surface from a single image. A calibrated camera faces a specular surface reflecting a calibrated scene (for instance a checkerboard or grid pattern). The mapping from the scene pattern to its reflected distorted image in the camera changes the local geometrical structure of the scene pattern. We show that if measurements of both local orientation and scale of the distorted scene in the image plane are available, this mapping can be inverted. Specifically, we prove that surface position and shape up to third order can be derived as a function of such local measurements when two orientations are available at the same point (e.g. a corner). Our results generalize previous work [1, 2] where the mirror surface geometry was recovered only up to first order from at least three intersecting lines. We validate our theoretical results with both numerical simulations and experiments with real surfaces.

Silvio Savarese, Min Chen, Pietro Perona
Structure of Applicable Surfaces from Single Views

The deformation of applicable surfaces such as sheets of paper satisfies the differential geometric constraints of isometry (lengths and areas are conserved) and vanishing Gaussian curvature. We show that these constraints lead to a closed set of equations that allow recovery of the full geometric structure from a single image of the surface and knowledge of its undeformed shape. We show that these partial differential equations can be reduced to the Hopf equation that arises in non-linear wave propagation, and deformations of the paper can be interpreted in terms of the characteristics of this equation. A new exact integration of these equations is developed that relates the 3-D structure of the applicable surface to an image. The solution is tested by comparison with particular exact solutions. We present results for both the forward and the inverse 3D structure recovery problem.

Nail Gumerov, Ali Zandifar, Ramani Duraiswami, Larry S. Davis
Joint Bayes Filter: A Hybrid Tracker for Non-rigid Hand Motion Recognition

In sign-language or gesture recognition, articulated hand motion tracking is usually a prerequisite to behaviour understanding. However the difficulties such as non-rigidity of the hand, complex background scenes, and occlusion etc make tracking a challenging task. In this paper we present a hybrid HMM/Particle filter tracker for simultaneously tracking and recognition of non-rigid hand motion. By utilising separate image cues, we decompose complex motion into two independent (non-rigid/rigid) components. A generative model is used to explore the intrinsic patterns of the hand articulation. Non-linear dynamics of the articulation such as fast appearance deformation can therefore be tracked without resorting to a complex kinematic model. The rigid motion component is approximated as the motion of a planar region, where a standard particle filter method suffice. The novel contribution of the paper is that we unify the independent treatments of non-rigid motion and rigid motion into a robust Bayesian framework. The efficacy of this method is demonstrated by performing successful tracking in the presence of significant occlusion clutter.

Huang Fei, Ian Reid
Iso-disparity Surfaces for General Stereo Configurations

This paper discusses the iso-disparity surfaces for general stereo configurations. These are the surfaces that are observed at the same resolution along the epipolar lines in both images of a stereo pair. For stereo algorithms that include smoothness terms either implicitly through area-based correlation or explicitly by using penalty terms for neighboring pixels with dissimilar disparities these surfaces also represent the implicit hypothesis made during stereo matching. Although the shape of these surfaces is well known for the standard stereo case (i.e. fronto-parallel planes), surprisingly enough for two cameras in a general configuration to our knowledge their shape has not been studied. This is, however, very important since it represents the discretisation of stereo sampling in 3D space and represents absolute bounds on performance independent of later resampling. We prove that the intersections of these surfaces with an epipolar plane consists of a family of conics with three fixed points. There is an interesting relation to the human horopter and we show that for stereo the retinas act as if they were flat. Further we discuss the relevance of iso-disparity surfaces to image-pair rectification and active vision. In experiments we show how one can configure an active stereo head to align iso-disparity surfaces to scene structures of interest such as a vertical wall, allowing better and faster stereo results.

Marc Pollefeys, Sudipta Sinha
Camera Calibration with Two Arbitrary Coplanar Circles

In this paper, we describe a novel camera calibration method to estimate the extrinsic parameters and the focal length of a camera by using only one single image of two coplanar circles with arbitrary radius.We consider that a method of simple operation to estimate the extrinsic parameters and the focal length of a camera is very important because in many vision based applications, the position, the pose and the zooming factor of a camera is adjusted frequently.An easy to use and convenient camera calibration method should have two characteristics: 1) the calibration object can be produced or prepared easily, and 2) the operation of a calibration job is simple and easy. Our new method satisfies this requirement, while most existing camera calibration methods do not because they need a specially designed calibration object, and require multi-view images. Because drawing beautiful circles with arbitrary radius is so easy that one can even draw it on the ground with only a rope and a stick, the calibration object used by our method can be prepared very easily. On the other hand, our method need only one image, and it allows that the centers of the circle and/or part of the circles to be occluded.Another useful feature of our method is that it can estimate the focal length as well as the extrinsic parameters of a camera simultaneously. This is because zoom lenses are used so widely, and the zooming factor is adjusted as frequently as the camera setting, the estimation of the focal length is almost a must whenever the camera setting is changed. The extensive experiments over simulated images and real images demonstrate the robustness and the effectiveness of our method.

Qian Chen, Haiyuan Wu, Toshikazu Wada
Reconstruction of 3-D Symmetric Curves from Perspective Images without Discrete Features

The shapes of many natural and man-made objects have curved contours. The images of such contours usually do not have sufficient distinctive features to apply conventional feature-based reconstruction algorithms. This paper shows that both the shape of curves in 3-D space and the camera poses can be accurately reconstructed from their perspective images with unknown point correspondences given that the curves have certain invariant properties such as symmetry. We show that in such cases the minimum number of views needed for a solution is remarkably small: one for planar curves and two for nonplanar curves (of arbitrary shapes), which is significantly less than what is required by most existing algorithms for general curves. Our solutions rely on minimizing the L2-distance between the shapes of the curves reconstructed via the “epipolar geometry” of symmetric curves. Both simulations and experiments on real images are presented to demonstrate the effectiveness of our approach.

Wei Hong, Yi Ma, Yizhou Yu
A Topology Preserving Non-rigid Registration Method Using a Symmetric Similarity Function-Application to 3-D Brain Images

3-D non-rigid brain image registration aims at estimating consistently long-distance and highly nonlinear deformations corresponding to anatomical variability between individuals. A consistent mapping is expected to preserve the integrity of warped structures and not to be dependent on the arbitrary choice of a reference image: the estimated transformation from A to B should be equal to the inverse transformation from B to A. This paper addresses these two issues in the context of a hierarchical parametric modeling of the mapping, based on B-spline functions. The parameters of the model are estimated by minimizing a symmetric form of the standard sum of squared differences criterion. Topology preservation is ensured by constraining the Jacobian of the transformation to remain positive on the whole continuous domain of the image as a non trivial 3-D extension of a previous work [1] dealing with the 2-D case. Results on synthetic and real-world data are shown to illustrate the contribution of preserving topology and using a symmetric similarity function.

Vincent Noblet, Christian Heinrich, Fabrice Heitz, Jean-Paul Armspach
A Correlation-Based Approach to Robust Point Set Registration

Correlation is a very effective way to align intensity images. We extend the correlation technique to point set registration using a method we call kernel correlation. Kernel correlation is an affinity measure, and it is also a function of the point set entropy. We define the point set registration problem as finding the maximum kernel correlation configuration of the the two point sets to be registered. The new registration method has intuitive interpretations, simple to implement algorithm and easy to prove convergence property. Our method shows favorable performance when compared with the iterative closest point (ICP) and EM-ICP methods.

Yanghai Tsin, Takeo Kanade
Hierarchical Organization of Shapes for Efficient Retrieval

This paper presents a geometric approach to perform: (i) hierarchical clustering of imaged objects according to the shapes of their boundaries, and (ii) testing of observed shapes for classification. An intrinsic metric on nonlinear, infinite-dimensional shape space, obtained using geodesic lengths, is used for clustering. This analysis is landmark free, does not require embedding shapes in ℝ2, and uses ordinary differential equations for flows (as opposed to partial differential equations). Intrinsic analysis also leads to well defined shape statistics such as means and covariances, and is computationally efficient. Clustering is performed in a hierarchical fashion. At any level of hierarchy clusters are generated using a minimum dispersion criterion and an MCMC-type search algorithm. Cluster means become elements to be clustered at the next level. Gaussian models on tangent spaces are used to pose binary or multiple hypothesis tests for classifying observed shapes. Hierarchical clustering and shape testing combine to form an efficient tool for shape retrieval from a large database of shapes. For databases with n shapes, the searches are performed using log(n) tests on average. Examples are presented for demonstrating these tools using shapes from Kimia shape database and the Surrey fish database.

Shantanu Joshi, Anuj Srivastava, Washington Mio, Xiuwen Liu

Information-Based Image Processing

Intrinsic Images by Entropy Minimization

A method was recently devised for the recovery of an invariant image from a 3-band colour image. The invariant image, originally 1D greyscale but here derived as a 2D chromaticity, is independent of lighting, and also has shading removed: it forms an intrinsic image that may be used as a guide in recovering colour images that are independent of illumination conditions. Invariance to illuminant colour and intensity means that such images are free of shadows, as well, to a good degree. The method devised finds an intrinsic reflectivity image based on assumptions of Lambertian reflectance, approximately Planckian lighting, and fairly narrowband camera sensors. Nevertheless, the method works well when these assumptions do not hold. A crucial piece of information is the angle for an “invariant direction” in a log-chromaticity space. To date, we have gleaned this information via a preliminary calibration routine, using the camera involved to capture images of a colour target under different lights. In this paper, we show that we can in fact dispense with the calibration step, by recognizing a simple but important fact: the correct projection is that which minimizes entropy in the resulting invariant image. To show that this must be the case we first consider synthetic images, and then apply the method to real images. We show that not only does a correct shadow-free image emerge, but also that the angle found agrees with that recovered from a calibration. As a result, we can find shadow-free images for images with unknown camera, and the method is applied successfully to remove shadows from unsourced imagery.

Graham D. Finlayson, Mark S. Drew, Cheng Lu
Image Similarity Using Mutual Information of Regions

Mutual information (MI) has emerged in recent years as an effective similarity measure for comparing images. One drawback of MI, however, is that it is calculated on a pixel by pixel basis, meaning that it takes into account only the relationships between corresponding individual pixels and not those of each pixel’s respective neighborhood. As a result, much of the spatial information inherent in images is not utilized. In this paper, we propose a novel extension to MI called regional mutual information (RMI). This extension efficiently takes neighborhood regions of corresponding pixels into account. We demonstrate the usefulness of RMI by applying it to a real-world problem in the medical domain—intensity-based 2D-3D registration of X-ray projection images (2D) to a CT image (3D). Using a gold-standard spine image data set, we show that RMI is a more robust similarity meaure for image registration than MI.

Daniel B. Russakoff, Carlo Tomasi, Torsten Rohlfing, Calvin R. Maurer Jr.
Backmatter
Metadaten
Titel
Computer Vision - ECCV 2004
herausgegeben von
Tomáš Pajdla
Jiří Matas
Copyright-Jahr
2004
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-24672-5
Print ISBN
978-3-540-21982-8
DOI
https://doi.org/10.1007/b97871