Skip to main content

2010 | Buch

Computer Vision – ACCV 2009

9th Asian Conference on Computer Vision, Xi’an, September 23-27, 2009, Revised Selected Papers, Part II

herausgegeben von: Hongbin Zha, Rin-ichiro Taniguchi, Stephen Maybank

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

It givesus greatpleasureto presentthe proceedings of the 9th Asian Conference on Computer Vision (ACCV 2009), held in Xi’an, China, in September 2009. This was the ?rst ACCV conference to take place in mainland China. We received a total of 670 full submissions, which is a new record in the ACCV series. Overall, 35 papers were selected for oral presentation and 131 as posters, yielding acceptance rates of 5.2% for oral, 19.6% for poster, and 24.8% in total. In the paper reviewing, we continued the tradition of previous ACCVsbyconductingtheprocessinadouble-blindmanner.Eachofthe33Area Chairs received a pool of about 20 papers and nominated a number of potential reviewers for each paper. Then, Program Committee Chairs allocated at least three reviewers to each paper, taking into consideration any con?icts of interest and the balance of loads. Once the reviews were ?nished, the Area Chairs made summaryreportsforthepapersintheirpools,basedonthereviewers’comments and on their own assessments of the papers.

Inhaltsverzeichnis

Frontmatter

Poster Session 1: Stereo, Motion Analysis, and Tracking

A Dynamic Programming Approach to Maximizing Tracks for Structure from Motion

We present a novel algorithm for improving the accuracy of structure from motion on video sequences. Its goal is to efficiently recover scene structure and camera pose by using dynamic programming to maximize the lengths of putative keypoint tracks. By efficiently discarding poor correspondences while maintaining the largest possible set of inliers, it ultimately provides a robust and accurate scene reconstruction. Traditional outlier detection strategies, such as RANSAC and its derivatives, cannot handle high dimensional problems such as structure from motion over long image sequences. We prove that, given an estimate of the camera pose at a given frame, the outlier detection is optimal and runs in low order polynomial time. The algorithm is applied on-line, processing each frame in sequential order. Results are presented on several indoor and outdoor video sequences processed both with and without the proposed optimization. The improvement in average reprojection errors demonstrates its effectiveness.

Jonathan Mooser, Suya You, Ulrich Neumann, Raphael Grasset, Mark Billinghurst
Dense and Accurate Spatio-temporal Multi-view Stereovision

In this paper, we propose a novel method to simultaneously and accurately estimate the 3D shape and 3D motion of a dynamic scene from multiple-viewpoint calibrated videos. We follow a variational approach in the vein of previous work on stereo reconstruction and scene flow estimation. We adopt a representation of a dynamic scene by an animated mesh, i.e. a polygonal mesh with fixed connectivity whose time-varying vertex positions sample the trajectories of material points. Interestingly, this representation ensures a consistent coding of shape and motion by construction. Our method accurately recovers 3D shape and 3D motion by optimizing the positions of the vertices of the animated mesh. This optimization is driven by an energy function which incorporates multi-view and inter-frame photo-consistency, smoothness of the spatio-temporal surface and of the velocity field. Central to our work is an image-based photo-consistency score which can be efficiently computed and which fully handles projective distortion and partial occlusions. We demonstrate the effectiveness of our method on several challenging real-world dynamic scenes.

Jérôme Courchay, Jean-Philippe Pons, Pascal Monasse, Renaud Keriven
Semi-supervised Feature Selection for Gender Classification

We apply a semi-supervised learning method to perform gender determination. The aim is to select the most discriminating feature components from the eigen-feature representation of faces. By making use of the information provided by both labeled and unlabeled data, we successfully reduce the size of the labeled data set required for gender feature selection, and improve the classification accuracy. Instead of using 2D brightness images, we use 2.5D facial needle-maps which reveal more directly facial shape information. Principal geodesic analysis (PGA), which is a generalization of principal component analysis (PCA) from data residing in a Euclidean space to data residing on a manifold, is used to obtain the eigen-feature representation of the facial needle-maps. In our experiments, we achieve 90.50% classification accuracy when 50% of the data are labeled. This performance demonstrates the effectiveness of this method for gender classification using a small labeled set, and the feasibility of gender classification using the facial shape information.

Jing Wu, William A. P. Smith, Edwin R. Hancock
Planar Scene Modeling from Quasiconvex Subproblems

In this paper, we propose a convex optimization based approach for piecewise planar reconstruction. We show that the task of reconstructing a piecewise planar environment can be set in an

L

 ∞ 

based Homographic framework that iteratively computes scene plane and camera pose parameters. Instead of image points, the algorithm optimizes over inter-image homographies. The resultant objective functions are minimized using Second Order Cone Programming algorithms. Apart from showing the convergence of the algorithm, we also empirically verify its robustness to error in initialization through various experiments on synthetic and real data. We intend this algorithm to be in between initialization approaches like decomposition methods and iterative non-linear minimization methods like Bundle Adjustment.

Visesh Chari, Anil Nelakanti, Chetan Jakkoju, C. V. Jawahar
Fast Depth Map Compression and Meshing with Compressed Tritree

We propose in this paper a new method based on binary space partitions to simultaneously mesh and compress a depth map. The method divides the map adaptively into a mesh that has the form of a binary triangular tree (tritree). The nodes of the mesh are the sparse non-uniform samples of the depth map and are able to interpolate the other pixels with minimal error. We apply differential coding after that to represent the sparse disparities at the mesh nodes. We then use entropy coding to compress the encoded disparities. We finally benefit from the binary tree and compress the mesh via binary tree coding to condense its representation. The results we obtained on various depth images show that the proposed scheme leads to lower depth error rate at higher compression ratios when compared to standard compression techniques like JPEG 2000. Moreover, using our method, a depth map is represented with a compressed adaptive mesh that can be directly applied to render the 3D scene.

Michel Sarkis, Waqar Zia, Klaus Diepold
A Three-Phase Approach to Photometric Calibration for Multi-projector Display Using LCD Projectors

Photometric calibration plays important role when building seamless appearance of multi-projector display. In this paper, we address photometric issues on chrominance variation and luminance nonuniformity in multi-display system constructed using LCD projectors. A three-phase approach is proposed to construct imaging models, which makes transformations among them when formulating the whole imaging procedure. These models are named as single-projector model, normalized-projector model and display-wall model. Single-projector model describes the imaging procedure from the projector’s input color to its measured tristimulus values in CIEXYZ. Normalized-projector model denotes the common gamut of projectors, which normalizes each single-projector model, and makes every projector have the same ranges of chrominance and luminance. The display-wall model treats the whole display as one projector, which has similar photometric model to single LCD projector. Weighting light contributions from all projectors using the display wall model, our method can achieve visually plausible seamlessness.

Lei Zhang, Siyu Liang, Bo Qin, Zhongding Jiang
Twisted Cubic: Degeneracy Degree and Relationship with General Degeneracy

Fundamental matrix, drawing geometric relationship between two images, plays an important role in 3-dimensional computer vision. Degenerate configurations of space points and two camera optical centers affect stability of computation for fundamental matrix. In order to robustly estimate fundamental matrix, it is necessary to study these degenerate configurations. We analyze all possible degenerate configurations caused by twisted cubic and give the corresponding degenerate rank for each case. Relationships with general degeneracies, the previous ruled quadric degeneracy and the homography degeneracy, are also reported in theory, where some interesting results are obtained such as a complete homography relation between two views. Based on the result of the paper, by applying RANSAC for degenerate data, we could obtain more robust estimations for fundamental matrix.

Tian Lan, YiHong Wu, Zhanyi Hu
Two-View Geometry and Reconstruction under Quasi-perspective Projection

Two-view geometry under quasi-perspective camera model and some new results are reported in the paper. Firstly, we prove that quasi fundamental matrix can be simplified to a special form with six degrees of freedom and it is invariant to any non-singular projective transformation. Secondly, the plane induced homography under quasi-perspective model can be simplified to a special form defined by six degrees of freedom. Quasi homography may be recovered from only two pairs of correspondences with known fundamental matrix. Extensive tests on synthetic and real images are performed to validate the results.

Guanghui Wang, Q. M. Jonathan Wu
Similarity Scores Based on Background Samples

Evaluating the similarity of images and their descriptors by employing discriminative learners has proven itself to be an effective face recognition paradigm. In this paper we show how “background samples”, that is, examples which do not belong to any of the classes being learned, may provide a significant performance boost to such face recognition systems. In particular, we make the following contributions. First, we define and evaluate the “Two-Shot Similarity” (TSS) score as an extension to the recently proposed “One-Shot Similarity” (OSS) measure. Both these measures utilize background samples to facilitate better recognition rates. Second, we examine the ranking of images most similar to a query image and employ these as a descriptor for that image. Finally, we provide results underscoring the importance of proper face alignment in automatic face recognition systems. These contributions in concert allow us to obtain a success rate of 86.83% on the Labeled Faces in the Wild (LFW) benchmark, outperforming current state-of-the-art results.

Lior Wolf, Tal Hassner, Yaniv Taigman
Human Action Recognition Using Spatio-temporal Classification

In this paper a framework “Temporal-Vector Trajectory Learning” (TVTL) for human action recognition is proposed. In this framework, the major concept is that we would like to add the temporal information into the action recognition process. Base on this purpose, there are three kinds of temporal information, LTM, DTM, and TTM, being proposed. With the three kinds of proposed temporal information, the k-NN classifier based on the Mahanalobis distance metric do have better results than just using spatial information. The experimental results demonstrate that the method can recognize the actions well. Especially with our TTM and DTM framework, they do have great accuracy rates. Even with noisy data, the framework still have good performance.

Chin-Hsien Fang, Ju-Chin Chen, Chien-Chung Tseng, Jenn-Jier James Lien
Face Alignment Using Boosting and Evolutionary Search

In this paper, we present a face alignment approach using granular features, boosting, and an evolutionary search algorithm. Active Appearance Models (AAM) integrate a shape-texture-combined morphable face model into an efficient fitting strategy, then Boosting Appearance Models (BAM) consider the face alignment problem as a process of maximizing the response from a boosting classifier. Enlightened by AAM and BAM, we present a framework which implements improved boosting classifiers based on more discriminative features and exhaustive search strategies. In this paper, we utilize granular features to replace the conventional rectangular Haar-like features, to improve discriminability, computational efficiency, and a larger search space. At the same time, we adopt the evolutionary search process to solve the deficiency of searching in the large feature space. Finally, we test our approach on a series of challenging data sets, to show the accuracy and efficiency on versatile face images.

Hua Zhang, Duanduan Liu, Mannes Poel, Anton Nijholt
Tracking Eye Gaze under Coordinated Head Rotations with an Ordinary Camera

Previous efforts in eye gaze tracking either did not consider head motion, or considered the 6 DOF head motions with multiple cameras or light sources. In this paper, we show that it is possible to track eye gaze under naturally head rotations(Yaw and Pitch) with only an ordinary webcam. We first carry out a study to examine the occurrence of eye-head coordination, and then show how to track such coordinated gaze by deriving a linear coordination equation and developing a tracking system based on a single webcam. Besides the theoretical aspect, we develop a vision-based tracking framework that can achieve an acceptable tracking accuracy in our experiments for estimating such eye-head coordinated gaze.

Haibo Wang, Chunhong Pan, Christophe Chaillou
Orientation and Scale Invariant Kernel-Based Object Tracking with Probabilistic Emphasizing

Tracking object with complex movements and background clutter is a challenging problem. The widely used mean-shift algorithm shows unsatisfactory results in such situations. To solve this problem, we propose a new mean-shift based tracking algorithm. Our method is consisted of three parts. First, a new objective function for mean-shift is proposed to handle background clutter problems. Second, orientation estimation method is proposed to extend the dimension of trackable movements. Third, a method using a new scale descriptor is proposed to adapt to scale changes of the object. To demonstrate the effectiveness of our method, we tested with several image sequences. Our algorithm is shown to be robust to background clutter and is able to track complex movements very accurately even in shaky scenarios.

Kwang Moo Yi, Soo Wan Kim, Jin Young Choi
Combining Edge and Color Features for Tracking Partially Occluded Humans

We propose an efficient approach for tracking humans in presence of severe occlusions through a combination of edge and color features. We implement a part based tracking paradigm to localize, accurately, the head, torso and the legs of a human target in successive frames. The Non-parametric color probability density estimates of these parts of the target are used to track them independently using mean shift. A robust edge matching algorithm, then, validates and refines the mean shift estimate of each part. The part based implementation of mean shift along with the novel edge matching algorithm ensures a reliable tracking of humans in upright pose through severe scene as well as inter-object occlusions. We use the CAVIAR Data Set as well as our own IIT Kanpur test cases demonstrating varying levels of occlusion in daily life situations to evaluate our tracking method.

Mandar Dixit, K. S. Venkatesh
Incremental Multi-view Face Tracking Based on General View Manifold

A novel incremental multi-view face tracking algorithm is proposed in the graphic model, which includes a general view manifold and specific incremental face model. We extend a general view manifold to the state-space model of face tracking to represent the view continuity and nonlinearity in the video data. Particularly, a global constraint on the overall appearance of the tracked multi-view faces is defined based on the point-to-manifold distance to avoid drifting. This novel face tracking model can successfully track faces under unseen views, and experimental results proved the new method is superior to two state-of-art algorithms for multi-view face tracking.

Wei Wei, Yanning Zhang
Hierarchical Model for Joint Detection and Tracking of Multi-target

We present a hierarchical and compositional model based on an And-or graph for joint detecting and tracking of multiple targets in video. In the graph, an And-node for the joint state of all targets is decomposed into multiple Or-nodes. Each Or-node represents an individual target’s state that includes position, appearance, and scale of the target. Leaf nodes are trained detectors. Measurements that supplied by the predictions of the tracker and leaf nodes are shared among Or-nodes.There are two kinds of production rules respectively designed for the problems of varying number and occlusions. One is association relations that distributes measurements to targets, and the other is semantic relations that represent occlusion between targets. The inference algorithm for the graph consists of three processing channels: (1) a bottom-up channel, which provides informative measurements by using learned detectors; (2) a top-down channel, which estimates the individual target state with joint probabilistic data association; (3) a context sensitive reasoning channel, which finalizes the estimation of the joint state with belief propagation. Additionally, an interaction mechanism between detection and tracking is implemented by a hybrid measurement process. The algorithm is validated widely by tracking peoples in several complex scenarios. Empirical results show that our tracker can reliably track multi-target without any prior knowledge about the number of targets and the targets may appear or disappear anywhere in the image frame and at any time in all these test videos.

Jianru Xue, Zheng Ma, Nanning Zheng
Heavy-Tailed Model for Visual Tracking via Robust Subspace Learning

Video-based target tracking, in essence, deals with nonstationary image streams, which is a challenging task in computer vision, because there always appear many abnormal motions and severe occlusions among the objects in the complex real-world environment. In a statistical perspective, an abnormal motion often exhibits non-Gaussian heavy-tailed behavior, which may take a long time to simulate. Most existing algorithms are unable to tackle this issue. In order to address it, we propose a novel tracking algorithm(

HIRPCA

) based on a heavy-tailed framework, which can robustly capture the effect of abnormal motion. In addition, since the conventional PCA is susceptible to outlying measurements in the sense of the least mean squared error minimisation, we extend and improve the incremental and robust PCA to learn a better representation of object appearance in a low-dimensional subspace, contributing to improving the performance of tracking in complex environment, such as light condition, significant pose and scale variation, temporary complete occlusion and abnormal motion. A series of experimental results show the good performance of the proposed method.

Daojing Wang, Chao Zhang, Pengwei Hao
Efficient Scale-Space Spatiotemporal Saliency Tracking for Distortion-Free Video Retargeting

Video retargeting aims at transforming an existing video in order to display it appropriately on a target device, often in a lower resolution, such as a mobile phone. To preserve a viewer’s experience, it is desired to keep the important regions in their original aspect ratio, i.e., to maintain them distortion-free. Most previous methods are susceptible to geometric distortions due to the anisotropic manipulation of image pixels. In this paper, we propose a novel approach to distortion-free video retargeting by scale-space spatiotemporal saliency tracking. An optimal source cropping window with the target aspect ratio is smoothly tracked over time, and then isotropically resized to the retargeted display. The problem is cast as the task of finding the most spatiotemporally salient cropping window with minimal information loss due to resizing. We conduct the spatiotemporal saliency analysis in scale-space to better account for the effect of resizing. By leveraging integral images, we develop an efficient coarse-to-fine solution that combines exhaustive coarse and gradient-based fine search, which we term scale-space spatiotemporal saliency tracking. Experiments on real-world videos and our user study demonstrate the efficacy of the proposed approach.

Gang Hua, Cha Zhang, Zicheng Liu, Zhengyou Zhang, Ying Shan
Visual Saliency Based Object Tracking

This paper presents a novel method of on-line object tracking with the static and motion saliency features extracted from the video frames locally, regionally and globally. When detecting the salient object, the saliency features are effectively combined in Conditional Random Field (CRF). Then Particle Filter is used when tracking the detected object. Like the attention shifting mechanism of human vision, when the object being tracked disappears, our tracking algorithm can change its target to other object automatically even without re-detection. And different from many other existing tracking methods, our algorithm has little dependence on the surface appearance of the object, so it can detect any category of objects as long as they are salient, and the tracking is robust to the change of global illumination and object shape. Experiments on video clips of various objects show the reliable results of our algorithm.

Geng Zhang, Zejian Yuan, Nanning Zheng, Xingdong Sheng, Tie Liu
People Tracking and Segmentation Using Efficient Shape Sequences Matching

We design an effective shape prior embedded human silhouettes extraction algorithm. Human silhouette extraction is found challenging because of articulated structures, pose variations, and background clutters. Many segmentation algorithms, including the Min-Cut algorithm, meet difficulties in human silhouette extraction. We aim at improving the performance of the Min-Cut algorithm by embedding shape prior knowledge. Unfortunately, seeking shape priors automatically is not trivial especially for human silhouettes. In this work, we present a shape sequence matching method that searches for the best path in spatial-temporal domain. The path contains shape priors of human silhouettes that can improve the segmentation. Matching shape sequences in spatial-temporal domain is advantageous over finding shape priors by matching shape templates with a single likelihood frame because errors can be avoided by searching for the global optimization in the domain. However, the matching in spatial-temporal domain is computationally intensive, which makes many shape matching methods impractical. We propose a novel shape matching approach that has low computational complexity independent of the number of shape templates. In addition, we investigate on how to make use of shape priors in a more adequate way. Embedding shape priors into the Min-Cut algorithm based on distances from shape templates is lacking because Euclidean distances cannot represent shape knowledge in a fully appropriate way. We embed distance and orientation information of shape priors simultaneously into the Min-Cut algorithm. Experimental results demonstrate that our algorithm is efficient and practical. Compared with previous works, our silhouettes extraction system produces better segmentation results.

Junqiu Wang, Yasushi Yagi, Yasushi Makihara
Monocular Template-Based Tracking of Inextensible Deformable Surfaces under L 2-Norm

We present a method for recovering the 3D shape of an inextensible deformable surface from a monocular image sequence. State-of-the-art method on this problem [1] utilizes

L

 ∞ 

-norm of reprojection residual vectors and formulate the tracking problem as a Second Order Cone Programming (SOCP) problem. Instead of using

L

 ∞ 

which is sensitive to outliers, we use

L

2

-norm of reprojection errors. Generally, using

L

2

leads a non-convex optimization problem which is difficult to minimize. Instead of solving the non-convex problem directly, we design an iterative

L

2

-norm approximation process to approximate the non-convex objective function, in which only a linear system needs to be solved at each iteration. Furthermore, we introduce a shape regularization term into this iterative process in order to keep the inextensibility of the recovered mesh. Compared with previous methods, ours performs more robust to outliers and large inter-frame motions with high computational efficiency. The robustness and accuracy of our approach are evaluated quantitatively on synthetic data and qualitatively on real data.

Shuhan Shen, Wenhuan Shi, Yuncai Liu
A Graph-Based Feature Combination Approach to Object Tracking

In this paper, we present a feature combination approach to object tracking based upon graph embedding techniques. The method presented here abstracts the low complexity features used for purposes of tracking to a relational structure and employs graph-spectral methods to combine them. This gives rise to a feature combination scheme which minimises the mutual cross-correlation between features and is devoid of free parameters. It also allows an analytical solution making use of matrix factorisation techniques. The new target location is recovered making use of a weighted combination of target-centre shifts corresponding to each of the features under study, where the feature weights arise from a cost function governed by the embedding process. This treatment permits the update of the feature weights in an on-line fashion in a straightforward manner. We illustrate the performance of our method in real-world image sequences and compare our results to a number of alternatives.

Quang Anh Nguyen, Antonio Robles-Kelly, Jun Zhou
A Smarter Particle Filter

Particle filtering is an effective sequential Monte Carlo approach to solve the recursive Bayesian filtering problem in non-linear and non-Gaussian systems. The algorithm is based on importance sampling. However, in the literature, the proper choice of the proposal distribution for importance sampling remains a tough task and has not been resolved yet. Inspired by the animal swarm intelligence in the evolutionary computing, we propose a swarm intelligence based particle filter algorithm. Unlike the independent particles in the conventional particle filter, the particles in our algorithm cooperate with each other and evolve according to the

cognitive effect

and

social effect

in analogy with the cooperative and social aspects of animal populations. Furthermore, the theoretical analysis shows that our algorithm is essentially a conventional particle filter with a hierarchial importance sampling process which is guided by the swarm intelligence extracted from the particle configuration, and thus greatly overcome the sample impoverishment problem suffered by particle filters. We compare the proposed approach with several nonlinear filters in the following tasks: state estimation, and visual tracking. The experiments demonstrate the effectiveness and promise of our approach.

Xiaoqin Zhang, Weiming Hu, Steve Maybank
Robust Real-Time Multiple Target Tracking

We propose a novel efficient algorithm for robust tracking of a fixed number of targets in real-time with low failure rate. The method is an instance of Sequential Importance Resampling filters approximating the posterior of complete target configurations as a mixture of Gaussians. Using predicted target positions by Kalman filters, data associations are sampled for each measurement sweep according to their likelihood allowing to constrain the number of associations per target. Updated target configurations are weighted for resampling pursuant to their explanatory power for former positions and measurements. Fixed-lag of the resulting positions increases the tracking quality while smart resampling and memoization decrease the computational demand. We present both, qualitative and quantitative experimental results on two demanding real-world applications with occluded and highly confusable targets, demonstrating the robustness and real-time performance of our approach outperforming current state-of-the-art.

Nicolai von Hoyningen-Huene, Michael Beetz
Dynamic Kernel-Based Progressive Particle Filter for 3D Human Motion Tracking

This paper presents a novel tracking algorithm, the dynamic kernel-based progressive particle filter (DKPPF), for markless 3D human body tracking. An articulated human body contains considerable degrees of freedom to be estimated. The proposed algorithm aims to reduce the computational complexity and improve the accuracy. The DKPPF decomposes the high dimensional parameter space into three low dimensional spaces and hierarchically searches the posture coefficients. Moreover, it applies multiple predictions and a mean shift tracker to estimate the human posture iteratively. A dynamic kernel model is proposed to automatically adjust the kernel bandwidth of mean shift trackers according to the probability distribution of the posture states. The kernel model is capable of improving the accuracy of the tracking result. The experimental examples show that the proposed approach can effectively improve the accuracy and expedite the computation.

Shih-Yao Lin, I-Cheng Chang
Bayesian 3D Human Body Pose Tracking from Depth Image Sequences

This paper addresses the problem of accurate and robust tracking of 3D human body pose from depth image sequences. Recovering the large number of degrees of freedom in human body movements from depth image sequence is challenging due to the need to resolve depth ambiguity caused by self-occlusions and difficulty to recover from tracking failure. Human body poses could be estimated with a high accuracy based on local optimization using dense correspondences between 3D depth data and the vertices in an articulated human model. However, it cannot recover from tracking failure. This paper presents a method to reconstruct human pose by detecting and tracking human body anatomical landmarks (key-points) from depth images. The proposed method is robust and recovers from tracking failure when a body part is re-detected. However, its pose estimation accuracy depends solely on image-based localization accuracy of key-points. To address these limitations, we present a flexible Bayesian method for integrating pose estimation results obtained by methods based on key-points and local optimization. Experimental results are shown and performance comparison is presented to demonstrate the effectiveness of the proposed method.

Youding Zhu, Kikuo Fujimura
Crowd Flow Characterization with Optimal Control Theory

Analyzing the crowd dynamics from video sequences is an open challenge in computer vision. Under a high crowd density assumption, we characterize the dynamics of the crowd flow by two related information: velocity and a disturbance potential which accounts for several elements likely to disturb the flow (the density of pedestrians, their interactions with the flow and the environment). The aim of this paper to simultaneously estimate from a sequence of crowded images those two quantities. While the velocity of the flow can be observed directly from the images with traditional techniques, this disturbance potential is far more trickier to estimate. We propose here to couple, through optimal control theory, a dynamical crowd evolution model with observations from the image sequence in order to estimate at the same time those two quantities from a video sequence. For this purpose, we derive a new and original continuum formulation of the crowd dynamics which appears to be well adapted to dense crowd video sequences. We demonstrate the efficiency of our approach on both synthetic and real crowd videos.

Pierre Allain, Nicolas Courty, Thomas Corpetti
Human Action Recognition Using HDP by Integrating Motion and Location Information

The method based on local features has an advantage that the important local motion feature is represented as bag-of-features, but lacks the location information. Additionally, in order to employ an approach based on bag-of-features, language models represented by pLSA and LDA (Latent Dirichlet Allocation) have to be applied to. These are unsupervised learning, but they require the number of latent topics to be set manually. In this study, in order to perform the LDA without specifying the number of the latent topics, and also to deal with multiple words concurrently, we propose unsupervised Multiple Instances Hierarchical Dirichlet Process MI-HDP-LDA by employing the local information concurrently. The proposed method, unsupervised MI-HDP-LDA, was evaluated for Weizmann dataset. The average recognition rate by LDA as conventional method was 61.8% and by the proposed method it was 73.7%, resulting in 11.9 points improvement.

Yasuo Ariki, Takuya Tonaru, Tetsuya Takiguchi
Detecting Spatiotemporal Structure Boundaries: Beyond Motion Discontinuities

The detection of motion boundaries has been and remains a long-standing challenge in computer vision. In this paper, the recovery of motion boundaries is recast in a broader scope, as focus is placed on the more general problem of detecting spacetime structure boundaries, where motion boundaries constitute a special case. This recasting allows uniform consideration of boundaries between a wider class of spacetime patterns than previously considered in the literature, both coherent motion as well as additional dynamic patterns. Examples of dynamic patterns beyond standard motion that are encompassed by the proposed approach include, flicker, transparency and various dynamic textures (e.g., scintillation). Toward this end, a novel representation and method for detecting these boundaries in raw image sequence data are presented. Central to the representation is the description of oriented spacetime structure in a distributed manner. Empirical evaluation of the proposed boundary detector on challenging natural imagery suggests its efficacy.

Konstantinos G. Derpanis, Richard P. Wildes
An Accelerated Human Motion Tracking System Based on Voxel Reconstruction under Complex Environments

In this paper, we propose an automated and markless human motion tracking system, including voxel acquisition and motion tracking. We first explore the problem of voxel reconstruction under a complex environment. Specifically, the procedure of the voxel acquisition is conducted under cluttered background, which makes the high quality silhouette unavailable. An accelerated Bayesian sensor fusion framework combining the information of pixel and super-pixel is adopted to calculate the probability of voxel occupancy, which is achieved by focusing the computation on the image region of interest. The evaluation of reconstruction result is given as well. After the acquisition of voxels, we adopt a hierarchical optimization strategy to solve the problem of human motion tracking in a high-dimensional space. Finally, the performance of our human motion tracking system is compared with the ground truth from a commercial marker motion capture. The experimental results show the proposed human motion tracking system works well under a complex environment.

Junchi Yan, Yin Li, Enliang Zheng, Yuncai Liu
Automated Center of Radial Distortion Estimation, Using Active Targets

In this paper an automated center of radial distortion estimation algorithm is explained. The method applied to the development of an autonomous camera calibration algorithm. The idea of active targets, which are controlled by calibration algorithm is the key to the autonomy in this work.

The proposed method decouples the center of radial distortion from other calibration parameters. It is shown that the proposed method approximates the center of radial distortion correctly. Also it helps to the accuracy of calibration framework.

Hamed Rezazadegan Tavakoli, Hamid Reza Pourreza
Rotation Averaging with Application to Camera-Rig Calibration

We present a method for calibrating the rotation between two cameras in a camera rig in the case of non-overlapping fields of view and in a globally consistent manner. First, rotation averaging strategies are discussed and an

L

1

-optimal rotation averaging algorithm is presented which is more robust than the

L

2

-optimal mean and the direct least squares mean. Second, we alternate between rotation averaging across several views and conjugate rotation averaging to achieve a global solution. Various experiments both on synthetic data and a real camera rig are conducted to evaluate the performance of the proposed algorithm. Experimental results suggest that the proposed algorithm realizes global consistency and a high precision estimate.

Yuchao Dai, Jochen Trumpf, Hongdong Li, Nick Barnes, Richard Hartley
Single-Camera Multi-baseline Stereo Using Fish-Eye Lens and Mirrors

This report proposes a monocular range measurement system with a fish-eye lens and mirrors placed around the lens. The fish-eye lens has a wide view-angle; the captured image includes a centered region of direct observation and surrounding regions of mirrored observations. These regions correspond to observations with multiple cameras at different positions and orientations. The captured image can be used for direct observation of a target with the centered region. Simultaneously, it can be used for multi-baseline stereo to reconstruct three-dimensional information. After calibration of the projection function of the fish-eye lens, the mirror positions and orientations are obtainable from the external parameters, which are used for the multi-baseline stereo measurement. Experimental results demonstrate the effectiveness of a real working system.

Wei Jiang, Masao Shimizu, Masatoshi Okutomi
Generation of an Omnidirectional Video without Invisible Areas Using Image Inpainting

Omnidirectional cameras usually cannot capture the entire direction of view due to a blind side. Thus, such an invisible part decreases realistic sensation in a telepresence system. In this study, an omnidirectional video without invisible areas is generated by filling in the missing region using an image inpainting technique for highly realistic sensation in telepresence. This paper proposes a new method that successfully inpaints a missing region by compensating for the change in appearance of textures caused by the camera motion and determining a searching area for similar textures considering the camera motion and the shape of the scene around the missing region. In experiments, the effectiveness of the proposed method is demonstrated by inpainting missing regions in a real image sequence captured with an omnidirectional camera and generating an omnidirectional video without invisible areas.

Norihiko Kawai, Kotaro Machikita, Tomokazu Sato, Naokazu Yokoya
Accurate and Efficient Cost Aggregation Strategy for Stereo Correspondence Based on Approximated Joint Bilateral Filtering

Recent local state-of-the-art stereo algorithms based on variable cost aggregation strategies allow for inferring disparity maps comparable to those yielded by algorithms based on global optimization schemes. Unfortunately, thought these results are excellent, they are obtained at the expense of high computational requirements that are comparable or even higher than those required by global approaches. In this paper, we propose a cost aggregation strategy based on joint bilateral filtering and incremental calculation schemes that allow for efficient and accurate inference of disparity maps. Experimental comparison with state-of-the-art techniques shows the effectiveness of our proposal.

Stefano Mattoccia, Simone Giardino, Andrea Gambini
Detecting Critical Configurations for Dividing Long Image Sequences for Factorization-Based 3-D Scene Reconstruction

The factorization 3-D reconstruction method requires that all feature points must occur in all images in a sequence. A long sequence has to be divided into multiple subsequences for partial reconstructions. This paper proposes an algorithm for dividing a long sequence for factorization-based Structure and Motion (SaM). First, we propose an Algorithm for Detecting a few Critical Configurations (ADCC) where Euclidean reconstruction degenerates. The critical configurations include: (1) coplanar 3-D points, (2) pure rotation, (3) rotation around two camera centers, and (4) presence of excessive noise and outliers in the measurements. The configurations in cases of (1), (2) and (4) will affect the rank of the scaled measurement matrix (SMM). The number of camera centers in case of (3) will affect the number of independent rows of the SMM. By examining the rank and the row space of the SMM, we detect the above-mentioned critical configurations. With the proposed ADCC algorithm, we are able to divide a long sequence into subsequences such that a successful 3-D reconstruction can be obtained on each subsequence with a high confidence. Experimental results on both synthetic and real sequences demonstrate the effectiveness of the proposed algorithm for an automatic 3-D reconstruction using the factorization method.

Ping Li, Rene Klein Gunnewiek, Peter de With
Scene Gist: A Holistic Generative Model of Natural Image

This paper proposes a novel generative model for natural image representation and scene classification. Given a natural image, it is decomposed with learned holistic basis called

scene gist

components. This gist representation is a global and adaptive image descriptor, generatively including most essential information related to visual perception. Meanwhile prior knowledge for scene category is integrated in the generative model to interpret the newly input image. To validate the efficiency of the scene gist representation, a simple nonparametric scene classification algorithm is developed based on minimizing the scene reconstruction error. Finally comparison with other scene classification algorithm is given to show the higher performance of the proposed model.

Bolei Zhou, Liqing Zhang
A Robust Algorithm for Color Correction between Two Stereo Images

Most multi-camera vision applications assume a single common color response for all cameras. However, significant luminance and chrominance discrepancies among different camera views often exist due to the dissimilar radiometric characteristics of different cameras and the variation of lighting conditions. These discrepancies may severely affect the algorithms that depend on the color correspondence. To address this problem, this paper proposes a robust color correction algorithm. Instead of handling the image as a whole or employing a color calibration object, we compensate for the color discrepancies region by region. The proposed algorithm can avoid the problem that the global correction techniques possiblely give bad correction results in local areas of an image. Many experiments have been done to prove the effectiveness and the robustness of our algorithm. Though we formulate the algorithm in the context of stereo vision, it can be extended to other applications in a straightforward way.

Qi Wang, Xi Sun, Zengfu Wang
Efficient Human Action Detection Using a Transferable Distance Function

In this paper, we address the problem of efficient human action detection with only one template. We choose the standard sliding-window approach to scan the template video against test videos, and the template video is represented by patch-based motion features. Using generic knowledge learnt from previous training sets, we weight the patches on the template video, by a transferable distance function. Based on the patch weighting, we propose a cascade structure which can efficiently scan the template video over test videos. Our method is evaluated on a human action dataset with cluttered background, and a ballet video with complex human actions. The experimental results show that our cascade structure not only achieves very reliable detection, but also can significantly improve the efficiency of patch-based human action detection, with an order of magnitude improvement in efficiency.

Weilong Yang, Yang Wang, Greg Mori
Crease Detection on Noisy Meshes via Probabilistic Scale Selection

Motivated by multi-scale edge detection in images, a novel multi-scale approach is presented to detect creases on 3D meshes. In this paper, we propose a probabilistic method to select local scales in the discrete 3D scale space. The likelihood function of local scale at each vertex is defined based on the minimum description length (MDL) principle. By introducing some prior knowledge, the optimal local scales are selected using Bayes rule. Therefore, the distribution of selected local scales is piecewise constant and discontinuity adaptive. The discrete 3D multi-scale representation of a given mesh can be constructed using an anisotropic diffusion method. With the selected scales, creases are traced by connecting the curvature extrema points detected on the mesh edges. Experimental results show that geometrically salient creases are well detected on noisy meshes using our method.

Tao Luo, Huai-Yu Wu, Hongbin Zha
Improved Uncalibrated View Synthesis by Extended Positioning of Virtual Cameras and Image Quality Optimization

Although there exist numerous view synthesis procedures, they are all restricted to certain special cases. Some procedures for instance can only handle a calibrated camera set while others are limited to interpolation between the reference views. In this paper we will present a fully automated uncalibrated view synthesis procedure. It allows an arbitrary camera placement in 3-D space on the basis of only two input images with a natural camera orientation. Natural camera orientation means that the focus of the virtual camera is intrinsically given by the geodesic which again is determined by the reference views. The presented procedure extends an existing view synthesis algorithm that allows only a camera placement on the 1-D geodesic (in the case of two reference views). The extensions are an additional camera placement along and orthogonally to the line of sight. The image quality of the virtual views will also be enhanced by utilizing the image information of both reference views.

Fabian Gigengack, Xiaoyi Jiang
Region Based Color Image Retrieval Using Curvelet Transform

Region based image retrieval has received significant attention from recent researches because it can provide local description of images, object based query, and semantic learning. In this paper, we apply curvelet transform to region based retrieval of color images. The curvelet transform has shown promising result in image de-noising, character recognition, and texture image retrieval. However, curvelet feature extraction for segmented regions is challenging because it requires regular (e.g., rectangular) shape images or regions, while segmented regions are usually irregular. An efficient method is proposed to convert irregular regions to regular regions. Discrete curvelet transform can then be applied on these regular shape regions. Experimental results and analyses show the effectiveness of the proposed shape transform method. We also show the curvelet feature extracted from the transformed regions outperforms the widely used Gabor features in retrieving natural color images.

Md. Monirul Islam, Dengsheng Zhang, Guojun Lu
Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Recently spatio-temporal local features have been proposed as image features to recognize events or human actions in videos. In this paper, we propose yet another local spatio-temporal feature based on the SURF detector, which is a lightweight local feature. Our method consists of two parts: extracting visual features and extracting motion features. First, we select candidate points based on the SURF detector. Next, we calculate motion features at each point with local temporal units divided in order to consider consecutiveness of motions. Since our proposed feature is intended to be robust to rotation, we rotate optical flow vectors to the main direction of extracted SURF features. In the experiments, we evaluate the proposed spatio-temporal local feature with the common dataset containing six kinds of simple human actions. As the result, the accuracy achieves 86%, which is almost equivalent to state-of-the-art. In addition, we make experiments to classify large amounts of Web video clips downloaded from Youtube.

Akitsugu Noguchi, Keiji Yanai
Multi-view Texturing of Imprecise Mesh

Reprojection of texture issued from cameras on a mesh estimated from multi-view reconstruction is often the last stage of the pipeline, used for rendering, visualization, or simulation of new views. Errors or imprecisions in the recovered 3D geometry are particularly noticeable at this stage. Nevertheless, it is sometimes desirable to get a visually correct rendering in spite of the inaccuracy in the mesh, when correction of this mesh is not an option, for example if the origin of error in the stereo pipeline is unknown, or if the mesh is a visual hull. We propose to apply slight deformations to the data images to fit at best the fixed mesh. This is done by intersecting rays issued from corresponding interest points in different views, projecting the resulting 3D points on the mesh and reprojecting these points on the images. This provides a displacement vector at matched interest points in the images, from which an approximating full distortion vector field can be estimated by thin-plate splines. Using the distorted images as input in texturing algorithms can result in noticeably better rendering, as demonstrated here in several experiments.

Ehsan Aganj, Pascal Monasse, Renaud Keriven

Poster Session 2: Segmentation, Detection, Color and Texture

Semantic Classification in Aerial Imagery by Integrating Appearance and Height Information

In this paper we present an efficient technique to obtain accurate semantic classification on the pixel level capable of integrating various modalities, such as color, edge responses, and height information. We propose a novel feature representation based on Sigma Points computations that enables a simple application of powerful covariance descriptors to a multi-class randomized forest framework. Additionally, we include semantic contextual knowledge using a conditional random field formulation. In order to achieve a fair comparison to state-of-the-art methods our approach is first evaluated on the MSRC image collection and is then demonstrated on three challenging aerial image datasets Dallas, Graz, and San Francisco. We obtain a full semantic classification on single aerial images within two minutes. Moreover, the computation time on large scale imagery including hundreds of images is investigated.

Stefan Kluckner, Thomas Mauthner, Peter M. Roth, Horst Bischof
Real-Time Video Matting Based on Bilayer Segmentation

Most current video matting methods perform off-line with a high calculation cost and require many user inputs for multiple key frames. In this paper, we present an online video matting method that runs in real-time based on bilayer segmentation. For the first step of the method, we introduce an accurate bilayer segmentation method for extracting the foreground region from the background using color likelihood propagation. For the second step, we perform alpha-matting based on the segmentation result. To enable real-time processing, we modify the conventional Bayesian matting method by using down-sampling and smart initialization, which increase the calculation speed by 5 times while maintaining the quality. Experimental results on various test sequences show the effectiveness of our method.

Viet-Quoc Pham, Keita Takahashi, Takeshi Naemura
Transductive Segmentation of Textured Meshes

This paper addresses the problem of segmenting a textured mesh into objects or object classes, consistently with user-supplied seeds. We view this task as transductive learning and use the flexibility of kernel-based weights to incorporate a various number of diverse features. Our method combines a Laplacian graph regularizer that enforces spatial coherence in label propagation and an SVM classifier that ensures dissemination of the seeds characteristics. Our interactive framework allows to easily specify classes seeds with sketches drawn on the mesh and potentially refine the segmentation. We obtain qualitatively good segmentations on several architectural scenes and show the applicability of our method to outliers removing.

Anne-Laure Chauve, Jean-Philippe Pons, Jean-Yves Audibert, Renaud Keriven
Levels of Details for Gaussian Mixture Models

Mixtures of Gaussians are a crucial statistical modeling tool at the heart of many challenging applications in computer vision and machine learning. In this paper, we first describe a novel and efficient algorithm for simplifying Gaussian mixture models using a generalization of the celebrated

k

-means quantization algorithm tailored to relative entropy. Our method is shown to compare experimentally favourably well with the state-of-the-art both in terms of time and quality performances. Second, we propose a practical enhanced approach providing a hierarchical representation of the simplified GMM while automatically computing the optimal number of Gaussians in the simplified mixture. Application to clustering-based image segmentation is reported.

Vincent Garcia, Frank Nielsen, Richard Nock
A Blind Robust Watermarking Scheme Based on ICA and Image Dividing Blocks

We propose a new scheme for the oblivious/blind robust watermarking of digital images based on independent component analysis (ICA). Most watermarking schemes based on ICA need additional information in watermark extraction process. But it is not encouraged because storing and transferring these additional information is not very convenient in some situations. A novel dividing image blocks is utilized in the paper that overcomes the shortcoming that watermark extraction based on ICA need additional information. The watermarking scheme, undergoing a variety of experiments, has shown its robustness against many attacks, e.g. JPEG, filter, gray scale reduction etc; it also exhibits a capability in image authentication.

Yuqiang Cao, Weiguo Gong
MIFT: A Mirror Reflection Invariant Feature Descriptor

In this paper, we present a mirror reflection invariant descriptor which is inspired from SIFT. While preserving tolerance to scale, rotation and even affine transformation, the proposed descriptor, MIFT, is also invariant to mirror reflection. We analyze the structure of MIFT and show how MIFT outperforms SIFT in the context of mirror reflection while performs as well as SIFT when there is no mirror reflection. The performance evaluation is demonstrated on natural images such as reflection on the water, non-rigid symmetric objects viewed from different sides, and reflection in the mirror. Based on MIFT, applications to image search and symmetry axis detection for planar symmetric objects are also shown.

Xiaojie Guo, Xiaochun Cao, Jiawan Zhang, Xuewei Li
Detection of Vehicle Manufacture Logos Using Contextual Information

Besides the decorative purposes, vehicle manufacture logos can provide rich information for vehicle verification and classification in many applications such as security and information retrieval. Detection and recognition of vehicle manufacture logos are, however, very challenging because they might lack of discriminative features themselves. In this paper, we propose a method to detect vehicle manufacture logos using contextual information, i.e., the information of surrounding objects near vehicle manufacture logos such as license plates, headlights, and grilles. The experimental results demonstrate that the proposed method is more effective and robust than other methods.

Wenting Lu, Honggang Zhang, Kunyan Lan, Jun Guo
Part-Based Object Detection Using Cascades of Boosted Classifiers

We present a new method for object detection that integrates part-based model with cascades of boosted classifiers. The parts are labeled in a supervised manner. For each part, we construct a boosted cascade by selecting the most important features from a large set and combining more complex classifiers. The weak learners used in each level of the cascade are gradient features of variable-size blocks. Moreover, we learn a model of the spatial relations between those parts. In detection, the cascade of classifiers for each part compute the part values within all sliding windows and then the object is localized within the image by integrating the spatial relations model. The experimental results demonstrate that training a cascade of boosted classifiers for each part and adding spatial constraints among parts improve performance of detection and localization.

Xiaozhen Xia, Wuyi Yang, Heping Li, Shuwu Zhang
A Novel Self-created Tree Structure Based Multi-view Face Detection

This paper proposes a self-created multi-layer cascaded architecture for multi-view face detection. Instead of using predefined

a priori

about face views, the system automatically divides the face sample space using the kernel-based branching competitive learning (KBCL) network at different discriminative resolutions. To improve the detection efficiency, a coarse-to-fine search mechanism is involved in the procedure, where the boosted mirror pair of points (MPP) classifiers is employed to classify image blocks at different discriminatory levels. The boosted MPP classifiers can approximate the performance of the standard support vector machines in a hierarchical way, which allows background blocks to be excluded quickly by simple classifiers and the ‘face like’ parts remained to be judged by more complicate classifiers. Experimental results show that our system provides a high detection rate with a particularly low level of false positives.

Xu Yang, Xin Yang, Huilin Xiong
Multilinear Nonparametric Feature Analysis

A novel method with general tensor representation for face recognition based on multilinear nonparametric discriminant analysis is proposed. Traditional LDA-based methods suffer some disadvantages such as small sample size problem (SSS), curse of dimensionality, as well as a fundamental limitation resulting from the parametric nature of scatter matrices, which are based on the Gaussian distribution assumption. In addition, traditional LDA-based methods and their variants don’t consider the class boundary of samples and interior structure of each sample class. To address the problems, a new multilinear nonparametric discriminant analysis is proposed, and new formulations of scatter matrices are given. Experimental results indicate the robustness and accuracy of the proposed method.

Xu Zhang, Xiangqun Zhang, Jian Cao, Yushu Liu
A Harris-Like Scale Invariant Feature Detector

Image feature detection is a fundamental issue in computer vision. SIFT[1] and SURF[2] are very effective in scale-space feature detection, but their stabilities are not good enough because unstable features such as edges are often detected even if they use edge suppression as a post-treatment. Inspired by Harris function[3], we extend Harris to scale-space and propose a novel method - Harris-like Scale Invariant Feature Detector (HLSIFD). Different to Harris-Laplace which is a hybrid method of Harris and Laplace, HLSIFD uses Hessian Matrix which is proved to be more stable in scale-space than Harris matrix. Unlike other methods suppressing edges in a sudden way(SIFT) or ignoring it(SURF), HLSIFD suppresses edges smoothly and uniformly, so fewer fake points are detected by HLSIFD. The approach is evaluated on public databases and in real scenes. Compared to the state of arts feature detectors: SIFT and SURF, HLSIFD shows high performance of HLSIFD.

Yinan Yu, Kaiqi Huang, Tieniu Tan
Probabilistic Cascade Random Fields for Man-Made Structure Detection

This paper develops the probabilistic version of cascade algorithm, specifically, Probabilistic AdaBoost Cascade (PABC). The proposed PABC algorithm is further employed to learn the association potential in the Discriminative Random Fields (DRF) model, resulting the Probabilistic Cascade Random Fields (PCRF) model. PCRF model enjoys the advantage of incorporating far more informative features than the conventional DRF model. Moreover, compared to the original DRF model, PCRF is less sensitive to the class imbalance problem. The proposed PABC and PCRF were applied to the task of man-made structure detection. We compared the performance of PABC with different settings, the performance of the original DRF model and that of PCRF. Detailed numerical analysis demonstrated that PABC improves the performance with more AdaBoost nodes, and the interaction potential in PCRF further improves the performance significantly.

Songfeng Zheng
A Novel System for Robust Text Location and Recognition of Book Covers

Text location and recognition is a vital and fundamental problem of processing images. In this paper we propose a novel system for text location and recognition focused on book covers. Our work consists of two main parts, learning-based text location and adaptive binarization guided recognition. First we extract three types of robust features from the training data provided on ICDAR2005 and utilize Ada-boost to combine these features into a powerful classifier for text regions detection and location. Second we apply the proposed adaptive binarization to process the located regions for recognition. Compared with previous works, our algorithm is robust in size, font and color of text, and insensitive for languages. In experiments, our system proved to have attractive performance.

Zhiyuan Zhang, Kaiyue Qi, Kai Chen, Chenxuan Li, Jianbo Chen, Haibing Guan
A Multi-scale Bilateral Structure Tensor Based Corner Detector

In this paper, a novel multi-scale nonlinear structure tensor based corner detection algorithm is proposed to improve effectively the classical Harris corner detector. By considering both the spatial and gradient distances of neighboring pixels, a nonlinear bilateral structure tensor is constructed to examine the image local pattern. It can be seen that the linear structure tensor used in the original Harris corner detector is a special case of the proposed bilateral one by considering only the spatial distance. Moreover, a multi-scale filtering scheme is developed to tell the trivial structures from true corners based on their different characteristics in multiple scales. The comparison between the proposed approach and four representative and state-of-the-art corner detectors shows that our method has much better performance in terms of both detection rate and localization accuracy.

Lin Zhang, Lei Zhang, David Zhang
Pedestrian Recognition Using Second-Order HOG Feature

Histogram of Oriented Gradients (HOG) is a well-known feature for pedestrian recognition which describes object appearance as local histograms of gradient orientation. However, it is incapable of describing higher-order properties of object appearance. In this paper we present a second-order HOG feature which attempts to capture second-order properties of object appearance by estimating the pairwise relationships among spatially neighbor components of HOG feature. In our preliminary experiments, we found that using harmonic-mean or min function to measure pairwise relationship gives satisfactory results. We demonstrate that the proposed second-order HOG feature can significantly improve the HOG feature on several pedestrian datasets, and it is also competitive to other second-order features including GLAC and CoHOG.

Hui Cao, Koichiro Yamaguchi, Takashi Naito, Yoshiki Ninomiya
Fabric Defect Detection and Classification Using Gabor Filters and Gaussian Mixture Model

This work investigates the problem of automatic and robust fabric defect detection and classification which are more essential and important in assuring the fabric quality. Two characteristics of this work are: first, a new scheme combining Gabor filters and Gaussian mixture model (GMM) is proposed for fabric defect detection and classification. In detection, the foreground mask and texture features are extracted using Gabor filters. In classification, a GMM based classifier is trained and assigns each foreground pixel to known classes. The second characteristic of this work is the test data is actually collected from Qinfeng textile factory, China, including nine different fabric defects with more than 1000 samples. All the evaluation of our method is based on these actual fabric images and the experimental results show the proposed algorithm achieved satisfied performance.

Yu Zhang, Zhaoyang Lu, Jing Li
Moving Object Segmentation in the H.264 Compressed Domain

A novel method for moving object segmentation in the H.264 compressed domain is proposed. In contrast to all known methods in which only motion information is used, the proposed method utilizes some characters of H.264 besides motion information with no more decoding required. In the proposed method, motion vector is refined firstly by spatial and temporal correlation of motion and initial segmentation is produced by using the motion vector difference after global motion estimation. Then, the result of segment-ation is refined by using intra prediction information in intra-frame. The refined result of segmentation is projected to subsequent frame and expansion and contraction operation is followed. Experimental results for several H.264 compressed video sequences demonstrate the good segmentation quality of the proposed approach.

Changfeng Niu, Yushu Liu
Video Segmentation Using Iterated Graph Cuts Based on Spatio-temporal Volumes

We present a novel approach to segmenting video using iterated graph cuts based on spatio-temporal volumes. We use the mean shift clustering algorithm to build the spatio-temporal volumes with different bandwidths from the input video. We compute the prior probability obtained by the likelihood from a color histogram and a distance transform using the segmentation results from graph cuts in the previous process, and set the probability as the t-link of the graph for the next process. The proposed method can segment regions of an object with a stepwise process from global to local segmentation by iterating the graph-cuts process with mean shift clustering using a different bandwidth. It is possible to reduce the number of nodes and edges to about 1/25 compared to the conventional method with the same segmentation rate.

Tomoyuki Nagahashi, Hironobu Fujiyoshi, Takeo Kanade
Spectral Graph Partitioning Based on a Random Walk Diffusion Similarity Measure

Spectral graph partitioning is a powerful tool for unsupervised data learning. Most existing algorithms for spectral graph partitioning directly utilize the pairwise similarity matrix of the data to perform graph partitioning. Consequently, they are incapable of fully capturing the intrinsic structural information of graphs. To address this problem, we propose a novel random walk diffusion similarity measure (

RWDSM

) for capturing the intrinsic structural information of graphs. The

RWDSM

is composed of three key components—emission, absorbing, and transmission. It is proven that graph partitioning on the

RWDSM

matrix performs better than on the pairwise similarity matrix of the data. Moreover, a spectral graph partitioning objective function (referred to as

DGPC

) is used for capturing the discriminant information of graphs. The

DGPC

is designed to effectively characterize the intra-class compactness and the inter-class separability. Based on the

RWDSM

and

DGPC

, we further develop a novel spectral graph partitioning algorithm (referred to as

DGPCA

). Theoretic analysis and experimental evaluations demonstrate the promise and effectiveness of the developed

DGPCA

.

Xi Li, Weiming Hu, Zhongfei Zhang, Yang Liu
Iterated Graph Cuts for Image Segmentation

Graph cuts based interactive segmentation has become very popular over the last decade. In standard graph cuts, the extraction of foreground object in a complex background often leads to many segmentation errors and the parameter

λ

in the energy function is hard to select. In this paper, we propose an iterated graph cuts algorithm, which starts from the sub-graph that comprises the user labeled foreground/background regions and works iteratively to label the surrounding un-segmented regions. In each iteration, only the local neighboring regions to the labeled regions are involved in the optimization so that much interference from the far unknown regions can be significantly reduced. To improve the segmentation efficiency and robustness, we use the mean shift method to partition the image into homogenous regions, and then implement the proposed iterated graph cuts algorithm by taking each region, instead of each pixel, as the graph node for segmentation. Extensive experiments on benchmark datasets demonstrated that our method gives much better segmentation results than the standard graph cuts and the GrabCut methods in both qualitative and quantitative evaluation. Another important advantage is that it is insensitive to the parameter

λ

in optimization.

Bo Peng, Lei Zhang, Jian Yang
Contour Extraction Based on Surround Inhibition and Contour Grouping

Extraction of object contours from the natural scene is a difficult task because it is hard to distinguish between object contour and texture edge. To overcome this problem, this paper presents a contour extraction method inspired by visual mechanism. Firstly, a biologically motivated surround inhibition process, improved by us, is applied to detect contour elements. Then we utilize visual cortical mechanisms of perceptual grouping to propose a contour grouping model. This model consists of two levels. At low level, a method is presented to compute local interaction between contour elements; at high level, a global energy function is suggested to perceive salient object contours. Finally, contours having high energy are retained while the others, such as texture edge, are removed. Experimental results show our method works well.

Yuan Li, Jianzhou Zhang, Ping Jiang
Confidence-Based Color Modeling for Online Video Segmentation

High quality online video segmentation is a very challenging task. Among various cues to infer the segmentation, the foreground and background color distributions are the most important. However, previous color modeling methods are error-prone when some parts of the foreground and background have similar colors, to address this problem, we propose a novel approach of Confidence-based Color Modeling (CCM). Our approach can adaptively tune the effects of global and per-pixel color models according to the confidence of their predictions, methods of measuring the confidence of both type of models are developed. We also propose an adaptive threshold method for background subtraction that is robust against ambiguous colors. Experiments demonstrate the effectiveness and efficiency of our method in reducing the segmentation errors incurred by ambiguous colors.

Fan Zhong, Xueying Qin, Jiazhou Chen, Wei Hua, Qunsheng Peng
Multicue Graph Mincut for Image Segmentation

We propose a general framework to encode various grouping cues for natural image segmentation. We extend the classical Gibbs energy of an MRF to three terms:

likelihood energy

,

coherence energy

and

separating energy

. We encode

generative cues

in the likelihood and coherence energy to ensure the goodness and feasibility of segmentation, and embed

discriminative cues

in the separating energy to encourage assigning two pixels with strong separability with different labels. We use a self-validated process to iteratively minimize the global Gibbs energy. Our approach is able to automatically determine the number of segments, and produce a natural hierarchy of coarse-to-fine segmentation. Experiments show that our approach works well for various segmentation problems, and outperforms existing methods in terms of robustness to noise and preservation of soft edges.

Wei Feng, Lei Xie, Zhi-Qiang Liu
Backmatter
Metadaten
Titel
Computer Vision – ACCV 2009
herausgegeben von
Hongbin Zha
Rin-ichiro Taniguchi
Stephen Maybank
Copyright-Jahr
2010
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-12304-7
Print ISBN
978-3-642-12303-0
DOI
https://doi.org/10.1007/978-3-642-12304-7

Neuer Inhalt