Skip to main content
Top

2011 | Book

Computer Vision – ACCV 2010

10th Asian Conference on Computer Vision, Queenstown, New Zealand, November 8-12, 2010, Revised Selected Papers, Part IV

Editors: Ron Kimmel, Reinhard Klette, Akihiro Sugimoto

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

The four-volume set LNCS 6492-6495 constitutes the thoroughly refereed post-proceedings of the 10th Asian Conference on Computer Vision, ACCV 2009, held in Queenstown, New Zealand in November 2010. All together the four volumes present 206 revised papers selected from a total of 739 Submissions. All current issues in computer vision are addressed ranging from algorithms that attempt to automatically understand the content of images, optical methods coupled with computational techniques that enhance and improve images, and capturing and analyzing the world's geometry while preparing the higher level image and shape understanding. Novel gemometry techniques, statistical learning methods, and modern algebraic procedures are dealt with as well.

Table of Contents

Frontmatter

Posters on Day 3 of ACCV 2010

Fast Computation of a Visual Hull

Two techniques for the fast computation of a visual hull without simplification are proposed. First, we tackle the most time consuming step for finding the intersections between projected rays and silhouette boundaries. We use the chain coding representation of silhouette boundaries for fast searching and computing with sub-pixel accuracy. Second, we analyze 3D-2D projection and back-projection relations and formulate them as 1D homographies. This formulation reduces computational cost and ambiguity that can be caused by measurement errors in the back-projection of 2D intersections to 3D. Furthermore, we show that the formulation is not limited to the projective space but also useful in the affine space. We generalize our techniques to an arbitrary 3D ray, so that the proposed method is directly applicable to both volume-based and surface-based visual hull methods. In our simulations, we compare the proposed algorithm with the state-of-the-art methods and show its advantages in terms of computational cost.

Sujung Kim, Hee-Dong Kim, Wook-Joong Kim, Seong-Dae Kim
Active Learning with the Furthest Nearest Neighbor Criterion for Facial Age Estimation

Providing training data for facial age estimation is very expensive in terms of age progress, privacy, human time and effort. In this paper, we present a novel active learning approach based on an on-line Two-Dimension Linear Discriminant Analysis for learning to quickly reach high performance but with minimal labeling effort. The proposed approach uses the classifier learnt from the small pool of labeled faces to select the most informative samples from the unlabeled set to increasingly improve the classifier. Specifically, we propose a novel data selection of the Furthest Nearest Neighbour (FNN) that generalizes the margin-based uncertainty to the multi-class case and which is easy to compute so that the proposed active learning can handle a large number of classes and large data sizes efficiently. Empirical experiments on FG-NET, Morph databases and a large unlabeled data set show that the proposed approach can achieve similar results using fewer samples than random selection.

Jian-Gang Wang, Eric Sung, Wei-Yun Yau
Real-Time Human Detection Using Relational Depth Similarity Features

Many conventional human detection methods use features based on gradients, such as histograms of oriented gradients (HOG), but human occlusions and complex backgrounds make accurate human detection difficult. Furthermore, real-time processing also presents problems because the use of raster scanning while varying the window scale comes at a high computational cost. To overcome these problems, we propose a method for detecting humans by Relational Depth Similarity Features(RDSF) based on depth information obtained from a TOF camera. Our method calculates the features derived from a similarity of depth histograms that represent the relationship between two local regions. During the process of detection, by using raster scanning in a 3D space, a considerable increase in speed is achieved. In addition, we perform highly accurate classification by considering of occlusion regions. Our method achieved a detection rate of 95.3% with a false positive rate of 1.0%. It also had a 11.5% higher performance than the conventional method, and our detection system can run in real-time (10 fps).

Sho Ikemura, Hironobu Fujiyoshi
Human Tracking by Multiple Kernel Boosting with Locality Affinity Constraints

In this paper, we incorporate the concept of Multiple Kernel Learning (MKL) algorithm, which is used in object categorization, into human tracking field. For efficiency, we devise an algorithm called Multiple Kernel Boosting (MKB), instead of directly adopting MKL. MKB aims to find an optimal combination of many single kernel SVMs focusing on different features and kernels by boosting technique. Besides, we apply Locality Affinity Constraints (LAC) to each selected SVM. LAC is computed from the distribution of support vectors of respective SVM, recording the underlying locality of training data. An update scheme to reselect good SVMs, adjust their weights and recalculate LAC is also included. Experiments on standard and our own testing sequences show that our MKB tracking outperforms some other state-of-the-art algorithms in handling various conditions.

Fan Yang, Huchuan Lu, Yen-Wei Chen
A Temporal Latent Topic Model for Facial Expression Recognition

In this paper we extend the latent Dirichlet allocation (LDA) topic model to model facial expression dynamics. Our topic model integrates the temporal information of image sequences through redefining topic generation probability without involving new latent variables or increasing inference difficulties. A collapsed Gibbs sampler is derived for batch learning with labeled training dataset and an efficient learning method for testing data is also discussed. We describe the resulting temporal latent topic model (TLTM) in detail and show how it can be applied to facial expression recognition. Experiments on CMU expression database illustrate that the proposed TLTM is very efficient in facial expression recognition.

Lifeng Shang, Kwok-Ping Chan
From Local Features to Global Shape Constraints: Heterogeneous Matching Scheme for Recognizing Objects under Serious Background Clutter

Object recognition in computer vision is the task to categorize images based on their content. With the absence of background clutter in images high recognition performance can be achieved. In this paper we show how the recognition performance is improved even with a high impact of background clutter and without additional information about the image. For this task we segment the image into patches and learn a geometric structure of the object. In evaluations we first show that our system is of comparable performance to other state-of-the-art system and that for a difficult dataset the recognition performance is improved by 13.31%.

Martin Klinkigt, Koichi Kise
3D Structure Refinement of Nonrigid Surfaces through Efficient Image Alignment

Given a template image with known 3D structure, we show how to refine the rough reconstruction of nonrigid surfaces from existing feature-based methods through efficient direct image alignment. Under the mild assumption that the barycentric coordinates of each 3D point on the surface keep constant, we prove that the template and the input image are correlated by piecewise homography, based on which a direct Lucas-Kanade image alignment method is proposed to iteratively recover an inextensible surface even with poor texture and sharp creases. To accelerate the direct Lucas-Kanade method, an equivalent but much more efficient method is proposed as well, in which the most time-consuming part of the Hessian can be pre-computed as a result of combining additive and inverse compositional expressions. Sufficient experiments on both synthetic and real images demonstrate the accuracy and efficiency of our proposed methods.

Yinqiang Zheng, Shigeki Sugimoto, Masatoshi Okutomi
Local Empirical Templates and Density Ratios for People Counting

We extract local empirical templates and density ratios from a large collection of surveillance videos, and develop a fast and low-cost scheme for people counting. The local empirical templates are extracted by clustering the foregrounds induced by single pedestrians with similar features in silhouettes. The density ratio is obtained by comparing the size of the foreground induced by a group of pedestrians to that of the local empirical template considered the most appropriate for the region where the group foreground is captured. Because of the local scale normalization between sizes, the density ratio appears to have a bound closely related to the number of pedestrians that induce the group foreground. We estimate the bounds of density ratios for groups of different numbers of pedestrians in the learning phase, and use the estimated bounds to count the pedestrians in online settings. The results are promising.

Dao Huu Hung, Sheng-Luen Chung, Gee-Sern Hsu
Curved Reflection Symmetry Detection with Self-validation

We propose a novel, self-validating approach for detecting curved reflection symmetry patterns from real, unsegmented images. Our method benefits from the observation that any curved symmetry pattern can be approximated by a sequence of piecewise rigid reflection patterns. Pairs of symmetric feature points are first detected (including both inliers and outliers) and treated as ‘particles’. Multiple-hypothesis sampling and pruning are used to sample a smooth path going through inlier particles to recover the curved reflection axis. Our approach generates an explicit supporting region of the curved reflection symmetry, which is further used for intermediate self-validation, making the detection process more robust than prior state-of-the-art algorithms. Experimental results on 200+ images demonstrate the effectiveness and superiority of the proposed approach.

Jingchen Liu, Yanxi Liu
An HMM-SVM-Based Automatic Image Annotation Approach

This paper presents a novel approach to

A

utomatic

I

mage

A

nnotation (AIA) which combines both

H

idden

M

arkov

M

odel (HMM) and

S

upport

V

ector

M

achine (SVM). Typical image annotation methods directly map low-level features to high-level concepts and overlook the importance to mining the contextual information among the annotated keywords. The proposed HMM-SVM based approach comprises two different kinds of HMMs based on image color and texture features as the first-stage mapping scheme and an SVM which is based on the prediction results from the two HMMs as a so-called high-level classifier for final keywording. Our proposed approach assigns 1-5 keywords to each testing image. Using the Corel image dataset, Our experiments have shown that the combination of a discriminative classification and a generative model is beneficial in image annotation

Yinjie Lei, Wilson Wong, Wei Liu, Mohammed Bennamoun
Video Deblurring and Super-Resolution Technique for Multiple Moving Objects

Video camera is now commonly used and demand of capturing a single frame from video sequence is increasing. Since resolution of video camera is usually lower than digital camera and video data usually contains a many motion blur in the sequence, simple frame capture can produce only low quality image; image restoration technique is inevitably required. In this paper, we propose a method to restore a sharp and high-resolution image from a video sequence by motion deblur for each frame followed by super-resolution technique. Since the frame-rate of the video camera is high and variance of feature appearance in successive frames and motion of feature points are usually small, we can still estimate scene geometries from video data with blur. Therefore, by using such geometric information, we first apply motion deblur for each frame, and then, super-resolve the images from the deblurred image set. For better result, we also propose an adaptive super-resolution technique considering different defocus blur effects dependent on depth. Experimental results are shown to prove the strength of our method.

Takuma Yamaguchi, Hisato Fukuda, Ryo Furukawa, Hiroshi Kawasaki, Peter Sturm
Sparse Source Separation of Non-instantaneous Spatially Varying Single Path Mixtures

We present a method for recovering source images from their non-instantaneous single path mixtures using sparse component analysis (SCA). Non-instantaneous single path mixtures refer to mixtures generated by a mixing system that spatially distorts the source images (non-instantaneous and spatially varying) without any reverberations (single path/anechoic). For example, such mixtures can be found when imaging through a semi-reflective convex medium or in various movie fade effects. Recent studies have used SCA to separately address the time/position varying and the non-instantaneous scenarios. The present study is devoted to the unified scenario. Given

n

anechoic mixtures (without multiple reflections) of

m

source images, we recover the images up to a limited number of unknown parameters. This is accomplished by means of correspondence that we establish between the sparse representation of the input mixtures. Analyzing these correspondences allows us to recover models of both spatial distortion and attenuation. We implement a staged method for recovering the spatial distortion and attenuation, in order to reduce parametric model complexity by making use of descriptor invariants and model separability. Once the models have been recovered, well known BSS tools and techniques are used in recovering the sources.

Albert Achtenberg, Yehoshua Y. Zeevi
Improving Gaussian Process Classification with Outlier Detection, with Applications in Image Classification

In many computer vision applications for recognition or classification, outlier detection plays an important role as it affects the accuracy and reliability of the result. We propose a novel approach for outlier detection using Gaussian process classification. With this approach, the outlier detection can be integrated to the classification process, instead of being treated separately. Experimental results on handwritten digit image recognition and vision based robot localization show that our approach performs better than other state of the art approaches.

Yan Gao, Yiqun Li
Robust Tracking Based on Pixel-Wise Spatial Pyramid and Biased Fusion

We propose a novel tracking algorithm for the balance between stability and adaptivity as well as a new online appearance model. Since the update error is inevitable, we present three tracking modules, i.e., reference model, soft reference model and adaptive model, and fuse them using biased multiplicative formula. These three contributors are built through the same appearance model with different update rate. The appearance model, Pixel-wise Spatial Pyramid, employs pixel feature vectors instead of SIFT vectors, to combine several pixel characteristics. In particular, the reserved pixel feature vectors are used to create a new codebook together with the earlier codebook. A hybrid feature map consisting of the reserved pixel vectors and anti-part of previous hybrid feature map is built to represent the new target map. Experimental results show that our approach tracks the object with drastic appearance change, accurately and robustly.

Huchuan Lu, Shipeng Lu, Yen-Wei Chen
Compressive Evaluation in Human Motion Tracking

The powerful theory of compressive sensing enables an efficient way to recover sparse or compressible signals from non-adaptive, sub-Nyquist-rate linear measurements. In particular, it has been shown that random projections can well approximate an isometry, provided that the number of linear measurements is no less than twice of the sparsity level of the signal. Inspired by these, we propose a compressive anneal particle filter to exploit sparsity existing in image-based human motion tracking. Instead of performing full signal recovery, we evaluate the observation likelihood directly in the compressive domain of the observed images. Moreover, we introduce a progressive multilevel wavelet decomposition staged at each anneal layer to accelerate the compressive evaluation in a coarse-to-fine fashion. The experiments with the benchmark dataset HumanEvaII show that the tracking process can be significantly accelerated, and the tracking accuracy is well maintained and comparable to the method using original image observations.

Yifan Lu, Lei Wang, Richard Hartley, Hongdong Li, Dan Xu
Reconstructing Mass-Conserved Water Surfaces Using Shape from Shading and Optical Flow

This paper introduces a method for reconstructing water from real video footage. Using a single input video, the proposed method produces a more informative reconstruction from a wider range of possible scenes than the current state of the art. The key is the combination of vision algorithms and physics laws. Shape from shading is used to capture the change of the water’s surface, from which a vertical velocity gradient field is calculated. Such a gradient field is used to constrain the tracking of horizontal velocities by minimizing an energy function as a weighted combination of mass-conservation and intensity-conservation. Hence the final reconstruction contains a dense velocity field that is incompressible in 3D. The proposed method is efficient and performs consistently well across water of different types.

David Pickup, Chuan Li, Darren Cosker, Peter Hall, Phil Willis
Earth Mover’s Morphing: Topology-Free Shape Morphing Using Cluster-Based EMD Flows

This paper describes a method for topology-free shape morphing based on region cluster-based Earth Mover’s Distance (EMD) flows, since existing methods for closed curve/surface-based shape morphing are inapplicable to regions with different genera. First, the shape region is decomposed into a number of small clusters by Fuzzy C-Means clustering. Next, the EMD between the clusters of two key shapes is calculated and the resultant EMD flows are exploited as a weighted many-to-many correspondence among the clusters. Then, the fuzzy clusters are transported based on the EMD flows and a transition control parameter. Unlike the closed curve/surface-based methods, the morphs using cluster transportation are not guaranteed to be a binary image, and hence graph cut-based binary denoising is applied to a volumetric image of the two-dimensional position and the one-dimensional transition control parameter. The experiments demonstrate that the proposed method can perform morphing between shapes with different genera, such as walking silhouettes or alphabetical characters.

Yasushi Makihara, Yasushi Yagi
Object Detection Using Local Difference Patterns

We propose a new method of background modeling for object detection. Many background models have been previously proposed, and they are divided into two types: “

pixel-based

models” which model stochastic changes in the value of each pixel and “

spatial-based

models” which model a local texture around each pixel. Pixel-based models are effective for periodic changes of pixel values, but they cannot deal with sudden illumination changes. On the contrary, spatial-based models are effective for sudden illumination changes, but they cannot deal with periodic change of pixel values, which often vary the textures. To solve these problems, we propose a new probabilistic background model integrating pixel-based and spatial-based models by considering the illumination fluctuation in localized regions. Several experiments show the effectiveness of our approach.

Satoshi Yoshinaga, Atsushi Shimada, Hajime Nagahara, Rin-ichiro Taniguchi
Randomised Manifold Forests for Principal Angle-Based Face Recognition

In set-based face recognition, each set of face images is often represented as a linear/nonlinear manifold and the Principal Angles (PA) or Kernel PAs are exploited to measure the (dis-)similarity between manifolds. This work systemically evaluates the effect of using different face image representations and different types of kernels in the KPA setup and presents a novel way of randomised learning of manifolds for set-based face recognition. First, our experiments show that sparse features such as Local Binary Patterns and Gabor wavelets significantly improve the accuracy of PA methods over ’pixel intensity’. Combining different features and types of kernels at their best hyper-parameters in a multiple classifier system has further yielded the improved accuracy. Based on the encouraging results, we propose a way of randomised learning of kernel types and hyper-parameters by the set-based Randomised Decision Forests. We observed that the proposed method with linear kernels efficiently competes with those of nonlinear kernels. Further incorporation of discriminative information by constrained subspaces in the proposed method has effectively improved the accuracy. In the experiments over the challenging data sets, the proposed methods improve the accuracy of the standard KPA method by about 35 percent and outperform the Support Vector Machine with the set-kernels manually tuned.

Ujwal D. Bonde, Tae-Kyun Kim, K. R. Ramakrishnan
Estimating Meteorological Visibility Using Cameras: A Probabilistic Model-Driven Approach

Estimating the atmospheric or meteorological visibility distance is very important for air and ground transport safety, as well as for air quality. However, there is no holistic approach to tackle the problem by camera. Most existing methods are data-driven approaches, which perform a linear regression between the contrast in the scene and the visual range estimated by means of reference additional sensors. In this paper, we propose a probabilistic model-based approach which takes into account the distribution of contrasts in the scene. It is robust to illumination variations in the scene by taking into account the Lambertian surfaces. To evaluate our model, meteorological ground truth data were collected, showing very promising results. This works opens new perspectives in the computer vision community dealing with environmental issues.

Nicolas Hautiére, Raouf Babari, Éric Dumont, Roland Brémond, Nicolas Paparoditis
Optimizing Visual Vocabularies Using Soft Assignment Entropies

The state of the art for large database object retrieval in images is based on quantizing descriptors of interest points into visual words. High similarity between matching image representations (as bags of words) is based upon the assumption that matched points in the two images end up in similar words in hard assignment or in similar representations in soft assignment techniques. In this paper we study how ground truth correspondences can be used to generate better visual vocabularies. Matching of image patches can be done e.g. using deformable models or from estimating 3D geometry. For optimization of the vocabulary, we propose minimizing the entropies of soft assignment of points. We base our clustering on hierarchical k-splits. The results from our entropy based clustering are compared with hierarchical k-means. The vocabularies have been tested on real data with decreased entropy and increased true positive rate, as well as better retrieval performance.

Yubin Kuang, Kalle Åström, Lars Kopp, Magnus Oskarsson, Martin Byröd
Totally-Corrective Multi-class Boosting

We proffer totally-corrective multi-class boosting algorithms in this work. First, we discuss the methods that extend two-class boosting to multi-class case by studying two existing boosting algorithms: AdaBoost.MO and SAMME, and formulate convex optimization problems that minimize their regularized cost functions. Then we propose a column-generation based totally-corrective framework for multi-class boosting learning by looking at the Lagrange dual problems. Experimental results on UCI datasets show that the new algorithms have comparable generalization capability but converge much faster than their counterparts. Experiments on MNIST handwriting digit classification also demonstrate the effectiveness of the proposed algorithms.

Zhihui Hao, Chunhua Shen, Nick Barnes, Bo Wang
Pyramid Center-Symmetric Local Binary/Trinary Patterns for Effective Pedestrian Detection

Detecting pedestrians in images and videos plays a critically important role in many computer vision applications. Extraction of effective features is the key to this task. Promising features should be discriminative, robust to various variations and easy to compute. In this work, we presents a novel feature, termed pyramid center-symmetric local binary/ternary patterns (pyramid CS-LBP/LTP), for pedestrian detection. The standard LBP proposed by Ojala et al. [1] mainly captures the texture information. The proposed CS-LBP feature, in contrast, captures the gradient information. Moreover, the pyramid CS-LBP/LTP is easy to implement and computationally efficient, which is desirable for real-time applications. Experiments on the INRIA pedestrian dataset show that the proposed feature outperforms the histograms of oriented gradients (HOG) feature and comparable with the start-of-the-art pyramid HOG (PHOG) feature when using the intersection kernel support vector machines (HIKSVMs). We also demonstrate that the combination of our pyramid CS-LBP feature and the PHOG feature could significantly improve the detection performance—producing state-of-the-art accuracy on the INRIA pedestrian dataset.

Yongbin Zheng, Chunhua Shen, Richard Hartley, Xinsheng Huang
Reducing Ambiguity in Object Recognition Using Relational Information

Local feature-based object recognition methods recognize learned objects by unordered local feature matching followed by verification. However, the matching between unordered feature sets might be ambiguous as the number of objects increases, because multiple similar features can be observed in different objects. In this context, we present a new method for textured object recognition based on relational information between local features. To efficiently reduce ambiguity, we represent objects using the Attributed Relational Graph. Robust object recognition is achieved by the inexact graph matching. Here, we propose a new method for building graphs and define robust attributes for nodes and edges of the graph, which are the most important factors in the graph-based object representation, and also propose a cost function for graph matching. Dependent on the proposed attributes, the proposed framework can be applied to both single-image-based and stereo-image-based object recognition.

Kuk-Jin Yoon, Min-Gil Shin
Posing to the Camera: Automatic Viewpoint Selection for Human Actions

In many scenarios a scene is filmed by multiple video cameras located at different viewing positions. The difficulty in watching multiple views simultaneously raises an immediate question - which cameras capture better views of the dynamic scene? When one can only display a single view (e.g. in TV broadcasts) a human producer manually selects the best view. In this paper we propose a method for evaluating the quality of a view, captured by a single camera. This can be used to automate viewpoint selection. We regard human actions as three-dimensional shapes induced by their silhouettes in the space-time volume. The quality of a view is evaluated by incorporating three measures that capture the visibility of the action provided by these space-time shapes. We evaluate the proposed approach both qualitatively and quantitatively.

Dmitry Rudoy, Lihi Zelnik-Manor
Orthogonality Based Stopping Condition for Iterative Image Deconvolution Methods

Deconvolution techniques are widely used for image enhancement from microscopy to astronomy. The most effective methods are based on some iteration techniques, including Bayesian blind methods or Greedy algorithms. The stopping condition is a main issue for all the non-regularized methods, since practically the original image is not known, and the estimation of quality is based on some distance between the measured image and its estimated counter-part. This distance is usually the mean square error (MSE), driving to an optimization on the Least-Squares measure. Based on the independence of signal and noise, we have established a new type of error measure, checking the orthogonality criterion of the measurement driven gradient and the estimation at a given iteration. We give an automatic procedure for estimating the stopping condition. We show here its superiority against conventional ad-hoc non-regularized methods at a wide range of noise models.

Dániel Szolgay, Tamás Szirányi
Probabilistic 3D Object Recognition Based on Multiple Interpretations Generation

We present a probabilistic 3D object recognition approach using multiple interpretations generation in cluttered domestic environment. How to handle pose ambiguity and uncertainty is the main challenge in most recognition systems. In our approach, invariant 3D lines are employed to generate the pose hypotheses as multiple interpretations, especially ambiguity from partial occlusion and fragment of 3D lines are taken into account. And the estimated pose is represented as a region instead of a point in pose space by considering the measurement uncertainties. Then, probability of each interpretation is computed reliably using Bayesian principle in terms of both likelihood and unlikelihood. Finally, fusion strategy is applied to a set of top ranked interpretations, which are further verified and refined to make more accurate pose estimation in real time. The experimental results support the potential of the proposed approach in the real cluttered domestic environment.

Zhaojin Lu, Sukhan Lee, Hyunwoo Kim
Planar Affine Rectification from Change of Scale

A method for affine rectification of a plane exploiting knowledge of relative scale changes is presented. The rectifying transformation is fully specified by the relative scale change at three non-collinear points or by two pairs of points where the relative scale change is known; the relative scale change between the pairs is not required. The method also allows homography estimation between two views of a planar scene from three point-with-scale correspondences.

The proposed method is simple to implement and without parameters; linear and thus supporting (algebraic) least squares solutions; and general, without restrictions on either the shape of the corresponding features or their mutual position.

The wide applicability of the method is demonstrated on text rectification, detection of repetitive patterns, texture normalization and estimation of homography from three point-with-scale correspondences.

Ondřej Chum, Jiří Matas
Sensor Measurements and Image Registration Fusion to Retrieve Variations of Satellite Attitude

Observation satellites use pushbroom sensors to capture images of the earth. These linear cameras acquire 1-D images over time and use the straight motion of the satellite to sweep out a region of space and build 2-D images. The stability of the imaging platform is crucial during the acquisition process to guaranty distortion free images. Positioning sensors are used to control and rectify the attitude variations of the satellite, but their sampling rate is too low to provide an accurate estimate of the motion. In this paper, we describe a way to fuse star tracker measurements with image registration in order to retrieve the attitude variations of the satellite. We introduce first a simplified motion model where the pushbroom camera is rotating during the acquisition of an image. Then we present the fusion model which combines low and high frequency informations of respectively the star tracker and the images; this is embedded in a Bayesian setting. Lastly, we illustrate the performance of our algorithm on three satellite datasets.

Régis Perrier, Elise Arnaud, Peter Sturm, Mathias Ortner
Image Segmentation Fusion Using General Ensemble Clustering Methods

A new framework for adapting common ensemble clustering methods to solve the image segmentation combination problem is presented. The framework is applied to the parameter selection problem in image segmentation and compared with supervised parameter learning. We quantitatively evaluate 9 ensemble clustering methods requiring a known number of clusters and 4 with adaptive estimation of the number of clusters. Experimental results explore the capabilities of the proposed framework. It is shown that the ensemble clustering approach yields results close to the supervised learning, but without any ground truth information.

Lucas Franek, Daniel Duarte Abdala, Sandro Vega-Pons, Xiaoyi Jiang
Real Time Myocardial Strain Analysis of Tagged MR Cines Using Element Space Non-rigid Registration

We develop a real time element-space non-rigid registration technique for cardiac motion tracking, enabling fast and automatic analysis of myocardial strain in tagged magnetic resonance (MR) cines. Non-rigid registration is achieved by minimizing the sum of squared differences for all pixels within a high order finite-element (FE) model customized to the specific geometry of the heart. The objective function and its derivatives are calculated in element space, and converted to image space using the Jacobian of the transformation. This enables an anisotropic distribution of user-defined model parameters, which can be customized to the application, thereby achieving fast estimations which require fewer degrees of freedom for a given level of accuracy than standard isotropic methods. A graphics processing unit (GPU) accelerated Levenberg-Marquardt procedure was implemented in Compute Unified Device Architecture (CUDA) environment to provide a fast, robust optimization procedure. The method was validated in 30 patients with wall motion abnormalities by comparison with ground truth provided by an independent expert observer using a manually-guided analysis procedure. A heart model comprising 32 parameters was capable of processing 36.5 frames per second, with an error in circumferential strain of − 1.97±1.18%. For comparison, a standard isotropic free-form deformation method requiring 324 parameters had greater error (− 3.70±1.15%) and slower frame-rate (4.5 frames/sec). In conclusion, GPU accelerated custom element-space non-rigid image registration enables real time automatic tracking of cardiac motion, and accurate estimation of myocardial strain in tagged MR cines.

Bo Li, Brett R. Cowan, Alistair A. Young
Extending AMCW Lidar Depth-of-Field Using a Coded Aperture

By augmenting a high resolution full-field Amplitude Modulated Continuous Wave lidar system with a coded aperture, we show that depth-of-field can be extended using explicit, albeit blurred, range data to determine PSF scale. Because complex domain range-images contain explicit range information, the aperture design is unconstrained by the necessity for range determination by depth-from-defocus. The coded aperture design is shown to improve restoration quality over a circular aperture. A proof-of-concept algorithm using dynamic PSF determination and spatially variant Landweber iterations is developed and using an empirically sampled point spread function is shown to work in cases without serious multipath interference or high phase complexity.

John P. Godbaz, Michael J. Cree, Adrian A. Dorrington
Surface Extraction from Iso-disparity Contours

This paper examines the relationship between iso-disparity contours in stereo disparity space and planar surfaces in the scene. We specify constraints that may be exploited to group iso-disparity contours belonging to the same planar surface, and identify discontinuities between planar surfaces. We demonstrate the use of such constraints for planar surface extraction, particularly where the boundaries between surfaces are orientation discontinuities rather than depth discontinuities (

e.g.

, segmenting obstacles and walls from a ground plane). We demonstrate the advantages of our approach over a range of indoor and outdoor stereo images, and show that iso-disparity analysis can provide a robust and efficient means of segmenting smooth surfaces, and obtaining planar surface models.

Chris McCarthy, Nick Barnes
Image De-fencing Revisited

We introduce a novel image defencing method suitable for consumer photography, where plausible results must be achieved under common camera settings. First, detection of lattices with see-through texels is performed in an iterative process using online learning and classification from intermediate results to aid subsequent detection. Then, segmentation of the foreground is performed using accumulated statistics from all lattice points. Next, multi-view inpainting is performed to fill in occluded areas with information from shifted views where parts of the occluded regions may be visible. For regions occluded in all views, we use novel symmetry-augmented inpainting, which combines traditional texture synthesis with an increased pool of candidate patches found by simulating bilateral symmetry patterns from the source image. The results show the effectiveness of our proposed method.

Minwoo Park, Kyle Brocklehurst, Robert T. Collins, Yanxi Liu
Feature-Assisted Dense Spatio-temporal Reconstruction from Binocular Sequences

In this paper, a dynamic surface is represented by a triangle mesh with dense vertices whose 3D positions change over time. These time-varying positions are reconstructed by finding their corresponding projections in the images captured by two calibrated and synchronized video cameras. To achieve accurate dense correspondences across views and frames, we first match sparse feature points and rely on them to provide good initialization and strong constraints in optimizing dense correspondence. Spatio-temporal consistency is utilized in matching both features and image points. Three synergistic constraints, image similarity, epipolar geometry and motion clue, are jointly used to optimize stereo and temporal correspondences simultaneously. Tracking failure due to self-occlusion or large appearance change are automatically handled. Experimental results show that complex shape and motion of dynamic surfaces like fabrics and skin can be successfully reconstructed with the proposed method.

Yihao Zhou, Yan Qiu Chen
Improved Spatial Pyramid Matching for Image Classification

Spatial analysis of salient feature points has been shown to be promising in image analysis and classification. In the past, spatial pyramid matching makes use of both of salient feature points and spatial multiresolution blocks to match between images. However, it is shown that different images or blocks can still have similar features using spatial pyramid matching. The analysis and matching will be more accurate in scale space. In this paper, we propose to do spatial pyramid matching in scale space. Specifically, pyramid match histograms are computed in multiple scales to refine the kernel for support vector machine classification. We show that the combination of salient point features, scale space and spatial pyramid matching improves the original spatial pyramid matching significantly.

Mohammad Shahiduzzaman, Dengsheng Zhang, Guojun Lu
Dense Multi-frame Optic Flow for Non-rigid Objects Using Subspace Constraints

In this paper we describe a variational approach to computing dense optic flow in the case of non-rigid motion. We optimise a global energy to compute the optic flow between each image in a sequence and a reference frame simultaneously. Our approach is based on subspace constraints which allow to express the optic flow at each pixel in a compact way as a linear combination of a 2D motion basis that can be pre-estimated from a set of reliable 2D tracks. We reformulate the multi-frame optic flow problem as the estimation of the coefficients that multiplied with the known basis will give the displacement vectors for each pixel. We adopt a variational framework in which we optimise a non-linearised global brightness constancy to cope with large displacements and impose homogeneous regularization on the multi-frame motion basis coefficients. Our approach has two strengths. First, the dramatic reduction in the number of variables to be computed (typically one order of magnitude) which has obvious computational advantages and second, the ability to deal with large displacements due to strong deformations. We conduct experiments on various sequences of non-rigid objects which show that our approach provides results comparable to state of the art variational multi-frame optic flow methods.

Ravi Garg, Luis Pizarro, Daniel Rueckert, Lourdes Agapito
Fast Recovery of Weakly Textured Surfaces from Monocular Image Sequences

We present a method for vision-based recovery of three-dimensional structures through simultaneous model reconstruction and camera position tracking from monocular images. Our approach does not rely on robust feature detecting schemes (such as SIFT, KLT etc.), but works directly on intensity values in the captured images. Thus, it is well-suited for reconstruction of surfaces that exhibit only minimal texture due to partial homogeneity of the surfaces. Our method is based on a well-known optimization technique, which has been implemented in an efficient yet flexible way, in order to achieve high performance while ensuring extensibility.

Oliver Ruepp, Darius Burschka
Ghost-Free High Dynamic Range Imaging

Most high dynamic range image (HDRI) algorithms assume stationary scene for registering multiple images which are taken under different exposure settings. In practice, however, there can be some global or local movements between images caused by either camera or object motions. This situation usually causes ghost artifacts which make the same object appear multiple times in the resultant HDRI. To solve this problem, most conventional algorithms conduct ghost detection procedures followed by ghost region filling with the estimated radiance values. However, usually these methods largely depend on the accuracy of the ghost detection results, and thus often suffer from color artifacts around the ghost regions. In this paper, we propose a new robust ghost-free HDRI generation algorithm that does not require accurate ghost detection and not suffer from the color artifact problem. To deal with the ghost problem, our algorithm utilizes the global intensity transfer functions obtained from joint probability density functions (pdfs) between different exposure images. Then, to estimate reliable radiance values, we employ a generalized weighted filtering technique using the global intensity transfer functions. Experimental results show that our method produces the state-of-the-art performance in generating ghost-free HDR images.

Yong Seok Heo, Kyoung Mu Lee, Sang Uk Lee, Youngsu Moon, Joonhyuk Cha
Pedestrian Recognition with a Learned Metric

This paper presents a new method for viewpoint invariant pedestrian recognition problem. We use a metric learning framework to obtain a robust metric for large margin nearest neighbor classification with rejection (i.e., classifier will return no matches if all neighbors are beyond a certain distance). The rejection condition necessitates the use of a uniform threshold for a maximum allowed distance for deeming a pair of images a match. In order to handle the rejection case, we propose a novel cost similar to the Large Margin Nearest Neighbor (LMNN) method and call our approach Large Margin Nearest Neighbor with Rejection (LMNN-R). Our method is able to achieve significant improvement over previously reported results on the standard Viewpoint Invariant Pedestrian Recognition (VIPeR [1]) dataset.

Mert Dikmen, Emre Akbas, Thomas S. Huang, Narendra Ahuja
A Color to Grayscale Conversion Considering Local and Global Contrast

For the conversion of a color image to a perceptually plausible grayscale one, the global and local contrast are simultaneously considered in this paper. The contrast is measured in terms of gradient field, and the energy function is designed to have less value when the gradient field of the grayscale image is closer to that of original color image (called target gradient field). For encoding both of local and global contrast into the energy function, the target gradient field is constructed from two kinds of edges : one that connects each pixel to neighboring pixels and the other that connects each pixel to predetermined landmark pixels. Although we can have exact solution to the energy minimization in the least squares sense, we also present a fast implementation for the conversion of large image, by approximating the energy function. The problem is then reduced to reconstructing a grayscale image from the modified gradient field over the standard 4-neighborhood system, and this can be easily solved by the fast 2D Poisson solver. In the experiments, the proposed method is tested on various images and shown to give perceptually more plausible results than the existing methods.

Jung Gap Kuk, Jae Hyun Ahn, Nam Ik Cho
Affordance Mining: Forming Perception through Action

This work employs data mining algorithms to discover visual entities that are strongly associated to autonomously discovered modes of action, in an embodied agent. Mappings are learnt from these perceptual entities, onto the agents action space. In general, low dimensional action spaces are better suited to unsupervised learning than high dimensional percept spaces, allowing for structure to be discovered in the action space, and used to organise the perceptual space. Local feature configurations that are strongly associated to a particular ‘type’ of action (and not all other action types) are considered likely to be relevant in eliciting that action type. By learning mappings from these relevant features onto the action space, the system is able to respond in real time to novel visual stimuli. The proposed approach is demonstrated on an autonomous navigation task, and the system is shown to identify the relevant visual entities to the task and to generate appropriate responses.

Liam Ellis, Michael Felsberg, Richard Bowden
Spatiotemporal Contour Grouping Using Abstract Part Models

In recent work [1], we introduced a framework for model-based perceptual grouping and shape abstraction using a vocabulary of simple part shapes. Given a user-defined vocabulary of simple abstract parts, the framework grouped image contours whose abstract shape was consistent with one of the part models. While the results showed promise, the representational gap between the actual image contours that make up an exemplar shape and the contours that make up an abstract part model is significant, and an abstraction of a group of image contours may be consistent with more than one part model; therefore, while recall of ground-truth parts was good, precision was poor. In this paper, we address the precision problem by moving the camera and exploiting spatiotemporal constraints in the grouping process. We introduce a novel probabilistic, graph-theoretic formulation of the problem, in which the spatiotemporal consistency of a perceptual group under camera motion is learned from a set of training sequences. In a set of comprehensive experiments, we demonstrate (not surprisingly) how a spatiotemporal framework for part-based perceptual grouping significantly outperforms a static image version.

Pablo Sala, Diego Macrini, Sven Dickinson
Efficient Multi-structure Robust Fitting with Incremental Top-k Lists Comparison

Random hypothesis sampling lies at the core of many popular robust fitting techniques such as RANSAC. In this paper, we propose a novel hypothesis sampling scheme based on incremental computation of distances between partial rankings (top-

k

lists) derived from residual sorting information. Our method simultaneously (1) guides the sampling such that hypotheses corresponding to all true structures can be quickly retrieved and (2) filters the hypotheses such that only a small but very promising subset remain. This permits the usage of simple agglomerative clustering on the surviving hypotheses for accurate model selection. The outcome is a highly efficient multi-structure robust estimation technique. Experiments on synthetic and real data show the superior performance of our approach over previous methods.

Hoi Sim Wong, Tat-Jun Chin, Jin Yu, David Suter
Flexible Online Calibration for a Mobile Projector-Camera System

This paper presents a method for calibrating a projector camera system consisting of a mobile projector, a stationary camera, and a planar screen. The method assumes the projector to be partially calibrated and the camera to be uncalibrated, and does not require any fiducials or natural markers on the screen. For the system of geometrically compensating images projected on the screen from a hand-held projector so that the images will always be displayed at a fixed position of the screen in a fixed shape, the method makes the projected images geometrically rectified; that is, it makes them have the correct rectangular shape of the correct aspect ratio. The method automatically performs this calibration online without requiring any effort on the user’s part; all the user has to do is project a video from the hand-held projector. Furthermore, when the system makes discontinuous temporal changes such as the case where the camera and/or the screen is suddenly relocated, it automatically recovers the calibrated state that was once lost. To realize these properties, we adopt the sequential LS method and extend it to be able to deal with temporal changes of the system. We show several experimental results obtained by a real system.

Daisuke Abe, Takayuki Okatani, Koichiro Deguchi
3D Object Recognition Based on Canonical Angles between Shape Subspaces

We propose a method to measure similarity of shape for 3D objects using 3-dimensional shape subspaces produced by the factorization method. We establish an index of shape similarity by measuring the geometrical relation between two shape subspaces using canonical angles. The proposed similarity measure is invariant to camera rotation and object motion, since the shape subspace is invariant to these changes under affine projection. However, to obtain a meaningful similarity measure, we must solve the difficult problem that the shape subspace changes depending on the ordering of the feature points used for the factorization. To avoid this ambiguity, and to ensure that feature points are matched between two objects, we introduce a method for sorting the order of feature points by comparing the orthogonal projection matrices of two shape subspaces. The validity of the proposed method has been demonstrated through evaluation experiments with synthetic feature points and actual face images.

Yosuke Igarashi, Kazuhiro Fukui
An Unsupervised Framework for Action Recognition Using Actemes

In speech recognition, phonemes have demonstrated their efficacy to model the words of a language. While they are well defined for languages, their extension to human actions is not straightforward. In this paper, we study such an extension and propose an unsupervised framework to find phoneme-like units for actions, which we call

actemes

, using 3D data and without any prior assumptions. To this purpose, build on an earlier proposed framework in speech literature to automatically find actemes in the training data. We experimentally show that actions defined in terms of actemes and actions defined by whole units give similar recognition results. We define actions out of the training set in terms of these actemes to see whether the actemes generalize to unseen actions. The results show that although the acteme definitions of the actions are not always semantically meaningful, they yield optimal recognition accuracy and constitute a promising direction of research for action modeling.

Kaustubh Kulkarni, Edmond Boyer, Radu Horaud, Amit Kale
Segmentation of Brain Tumors in Multi-parametric MR Images via Robust Statistic Information Propagation

A method is presented to segment brain tumors in multi-parametric MR images via robustly propagating reliable statistical tumor information which is extracted from training tumor images using a support vector machine (SVM) classification method. The propagation of reliable statistical tumor information is implemented using a graph theoretic approach to achieve tumor segmentation with local and global consistency. To limit information propagation between image voxels of different properties, image boundary information is used in conjunction with image intensity similarity and anatomical spatial proximity to define weights of graph edges. The proposed method has been applied to 3D multi-parametric MR images with tumors of different sizes and locations. Quantitative comparison results with state-of-the-art methods indicate that our method can achieve competitive tumor segmentation performance.

Hongming Li, Ming Song, Yong Fan
Face Recognition with Decision Tree-Based Local Binary Patterns

Many state-of-the-art face recognition algorithms use image descriptors based on features known as Local Binary Patterns (LBPs). While many variations of LBP exist, so far none of them can automatically adapt to the training data. We introduce and analyze a novel generalization of LBP that learns the most discriminative LBP-like features for each facial region in a supervised manner. Since the proposed method is based on Decision Trees, we call it Decision Tree Local Binary Patterns or DT-LBPs. Tests on standard face recognition datasets show the superiority of DT-LBP with respect of several state-of-the-art feature descriptors regularly used in face recognition applications.

Daniel Maturana, Domingo Mery, Álvaro Soto
Occlusion Handling with ℓ1-Regularized Sparse Reconstruction

Tracking multi-object under occlusion is a challenging task. When occlusion happens, only the visible part of occluded object can provide reliable information for the matching. In conventional algorithms, the deducing of the occlusion relationship is needed to derive the visible part. However deducing the occlusion relationship is difficult. The inter-determined effect between the occlusion relationship and the tracking results will degenerate the tracking performance, and even lead to the tracking failure. In this paper, we propose a novel framework to track multi-object with occlusion handling according to sparse reconstruction. The matching with ℓ

1

-regularized sparse reconstruction can automatically focus on the visible part of the occluded object, and thus exclude the need of deducing the occlusion relationship. The tracking is simplified into a joint Bayesian inference problem. We compare our algorithm with the state-of-the-art algorithms. The experimental results show the superiority of our algorithm over other competing algorithms.

Wei Li, Bing Li, Xiaoqin Zhang, Weiming Hu, Hanzi Wang, Guan Luo
An Approximation Algorithm for Computing Minimum-Length Polygons in 3D Images

Length measurements in 3D images have raised interest in image geometry for a long time. This paper discusses the Euclidean shortest path (ESP) to be calculated in a loop of face-connected grid cubes in the 3D orthogonal grid, which are defined by

minimum-length polygonal

(MLP)

curves

. We propose a new approximation algorithm for computing such an MLP. It is much simpler and easier to understand and to implement than previously published algorithms by Li and Klette. It also has a straightforward application for finding an approximate minimum-length polygonal arc (MLA), a generalization of the MLP problem. We also propose two heuristic algorithms for computing a simple cube-arc within a 3D image component, with a minimum number of cubes between two cubes in this component. This may be interpreted as being an approximate solution to the general ESP problem in 3D (which is known as being NP-hard) assuming a regular subdivision of the 3D space into cubes of uniform size.

Fajie Li, Xiuxia Pan
Classifier Acceleration by Imitation

This paper presents a framework named “Classifier Molding” that imitates arbitrary classifiers by linear regression trees so as to accelerate classification speed. This framework requires an accurate (but slow) classifier and large amount of training data. As an example of accurate classifier, we used the Compound Similarity Method (CSM) for Industrial Ink Jet Printer (IIJP) character recognition problem. The input-output relationship of trained CSM is imitated by a linear regression tree by providing a large amount of training data. For generating the training data, we developed a character pattern fluctuation method simulating the IIJP printing process. The learnt linear regression tree can be used as an accelerated classifier. Based on this classifier, we also developed Classification based Character Segmentation (CCS) method, which extracts character patterns from an image so as to maximize the total classification scores. Through extensive experiments, we confirmed that imitated classifiers are 1500 times faster than the original classifier without dropping the recognition rate and CCS method greatly corrects the segmentation errors of bottom-up segmentation method.

Takahiro Ota, Toshikazu Wada, Takayuki Nakamura
Recognizing Continuous Grammatical Marker Facial Gestures in Sign Language Video

In American Sign Language (ASL) the structure of signed sentences is conveyed by grammatical markers which are represented by facial feature movements and head motions. Without recovering grammatical markers, a sign language recognition system cannot fully reconstruct a signed sentence. However, this problem has been largely neglected in the literature. In this paper, we propose to use a 2-layer Conditional Random Field model for recognizing continuously signed grammatical markers in ASL. This recognition requires identifying both facial feature movements and head motions while dealing with uncertainty introduced by movement epenthesis and other effects. We used videos of the signers’ faces, recorded while they signed simple sentences containing multiple grammatical markers. In our experiments, the proposed classifier yielded a precision rate of 93.76% and a recall rate of 85.54%.

Tan Dat Nguyen, Surendra Ranganath
Invariant Feature Set Generation with the Linear Manifold Self-organizing Map

One of the most important challenges faced by computer vision is the almost unlimited possibilities of variation associated with the objects. It has been hypothesized that the brain represents image manifolds as manifolds of stable neural-activity patterns. In this paper, we explore the possibility of manifold representation with a set of topographically organized neurons with each representing a local linear manifold and capturing some local linear feature invariance. In particular, we propose to consider the local subspace learning at each neuron of the network from a Gaussian likelihood point of view. Robustness of the algorithm with respect to the learning rate issue is obtained by considering statistical efficiency. Compared to its predecessors, the proposed network is more adaptive and robust in learning globally nonlinear data manifolds, which is verified by experiments on handwritten digit image modeling.

Huicheng Zheng
A Multi-level Supporting Scheme for Face Recognition under Partial Occlusions and Disguise

Face recognition has always been a challenging task in real-life surveillance videos, with partial occlusion being one of the key factors affecting the robustness of face recognition systems. Previous researches had approached the problem of face recognition with partial occlusions by dividing a face image into local patches, and training an independent classifier for each local patch. The final recognition result was then decided by integrating the results of all local patch classifiers. Such a local approach, however, ignored all the crucial distinguishing information presented in the global holistic faces. Instead of using only local patch classifiers, this paper presents a novel multi-level supporting scheme which incorporates patch classifiers at multiple levels, including both the global holistic face and local face patches at different levels. This supporting scheme employs a novel criteria-based class candidates selection process. This selection process preserves more class candidates for consideration as the final recognition results when there are conflicts between patch classifiers, while enables a fast decision making when most of the classifiers conclude to the same set of class candidates. All the patch classifiers will contribute their supports to each selected class candidate. The support of each classifier is defined as a simple distance-based likelihood ratio, which effectively enhances the effect of a “more-confident” classifier. The proposed supporting scheme is evaluated using the AR face database which contains faces with different facial expressions and face occlusions in real scenarios. Experimental results show that the proposed supporting scheme gives a high recognition rate, and outperforms other existing methods.

Jacky S-C. Yuk, Kwan-Yee K. Wong, Ronald H-Y. Chung
Foreground and Shadow Segmentation Based on a Homography-Correspondence Pair

A static binocular camera system is widely used in many computer vision applications; and being able to segment foreground, shadow, and background is an important problem for them. In this paper, we propose a homography-correspondence pair-based segmentation framework. Existing segmentation approaches, based on homography constraints, often suffer from occlusion problems. In our approach, we treat a homography-correspondence pair symmetrically, to explicitly take the occlusion relationship into account, and we regard the segmentation problem as a multi-labeling problem for the homography-correspondence pair. We then formulate an energy function for this problem and get the pair-wise segmentation results by minimizing them via an

α

-

β

swap algorithm. Experimental results show that accurate segmentation is obtained in the presence of the occlusion region in each side image.

Haruyuki Iwama, Yasushi Makihara, Yasushi Yagi
Backmatter
Metadata
Title
Computer Vision – ACCV 2010
Editors
Ron Kimmel
Reinhard Klette
Akihiro Sugimoto
Copyright Year
2011
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-19282-1
Print ISBN
978-3-642-19281-4
DOI
https://doi.org/10.1007/978-3-642-19282-1

Premium Partner