Skip to main content
Top

2011 | Book

Computer Vision – ACCV 2010

10th Asian Conference on Computer Vision, Queenstown, New Zealand, November 8-12, 2010, Revised Selected Papers, Part II

Editors: Ron Kimmel, Reinhard Klette, Akihiro Sugimoto

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

The four-volume set LNCS 6492-6495 constitutes the thoroughly refereed post-proceedings of the 10th Asian Conference on Computer Vision, ACCV 2009, held in Queenstown, New Zealand in November 2010. All together the four volumes present 206 revised papers selected from a total of 739 Submissions. All current issues in computer vision are addressed ranging from algorithms that attempt to automatically understand the content of images, optical methods coupled with computational techniques that enhance and improve images, and capturing and analyzing the world's geometry while preparing the higher level image and shape understanding. Novel geometry techniques, statistical learning methods, and modern algebraic procedures are dealt with as well.

Table of Contents

Frontmatter

Posters on Day 1 of ACCV 2010

Generic Object Class Detection Using Boosted Configurations of Oriented Edges

In this paper we introduce a new representation for shape-based object class detection. This representation is based on very sparse and slightly flexible configurations of oriented edges. An ensemble of such configurations is learnt in a boosting framework. Each edge configuration can capture some local

or global

shape property of the target class and the representation is thus

not limited

to representing and detecting visual classes that have distinctive local structures. The representation is also able to handle significant intra-class variation. The representation allows for very efficient detection and can be learnt automatically from weakly labelled training images of the target class. The main drawback of the method is that, since its inductive bias is rather weak, it needs a comparatively large training set. We evaluate on a standard database [1] and when using a slightly extended training set, our method outperforms state of the art [2] on four out of five classes.

Oscar Danielsson, Stefan Carlsson
Unsupervised Feature Selection for Salient Object Detection

Feature selection plays a crucial role in deciding the salient regions of an image as in any other pattern recognition problem. However the problem of identifying the relevant features that plays a fundamental role in saliency of an image has not received much attention so far. We introduce an unsupervised feature selection method to improve the accuracy of salient object detection. The noisy irrelevant features in the image are identified by maximizing the mixing rate of a Markov process running on a linear combination of various graphs, each representing a feature. The global optimum of this convex problem is achieved by maximizing the second smallest eigen value of the graph Laplacian via semi-definite programming. The enhanced image graph model, after the removal of irrelevant features, is shown to improve the salient object detection performance on a large image data base with annotated ‘ground truth’.

Viswanath Gopalakrishnan, Yiqun Hu, Deepu Rajan
MRF Labeling for Multi-view Range Image Integration

Multi-view range image integration focuses on producing a single reasonable 3D point cloud from multiple 2.5D range images for the reconstruction of a watertight manifold surface. However, registration errors and scanning noise usually lead to a poor integration and, as a result, the reconstructed surface cannot have topology and geometry consistent with the data source. This paper proposes a novel method cast in the framework of Markov random fields (MRF) to address the problem. We define a probabilistic description of a MRF labeling based on all input range images and then employ loopy belief propagation to solve this MRF, leading to a globally optimised integration with accurate local details. Experiments show the advantages and superiority of our MRF-based approach over existing methods.

Ran Song, Yonghuai Liu, Ralph R. Martin, Paul L. Rosin
Wave Interference for Pattern Description

This paper presents a novel compact description of a pattern based on the interference of circular waves. The proposed approach, called “interference description”, leads to a representation of the pattern, where the spatial relations of its constituent parts are intrinsically taken into account. Due to the intrinsic characteristics of the interference phenomenon, this description includes more information than a simple sum of individual parts. Therefore it is suitable for representing the interrelations of different pattern components. We illustrate that the proposed description satisfies some of the key Gestalt properties of human perception such as invariance, emergence and reification, which are also desirable for efficient pattern description. We further present a method for matching the proposed interference descriptions of different patterns. In a series of experiments, we demonstrate the effectiveness of our description for several computer vision tasks such as pattern recognition, shape matching and retrieval.

Selen Atasoy, Diana Mateus, Andreas Georgiou, Nassir Navab, Guang-Zhong Yang
Colour Dynamic Photometric Stereo for Textured Surfaces

In this paper we present a novel method to apply photometric stereo on textured dynamic surfaces. We aim at exploiting the high accuracy of photometric stereo and reconstruct local surface orientation from illumination changes. The main difficulty derives from the fact that photometric stereo requires varying illumination while the object remains still, which makes it quite impractical to use for dynamic surfaces. Using coloured lights gives a clear solution to this problem; however, the system of equations is still ill-posed and it is ambiguous whether the change of an observed surface colour is due to the change of the surface gradient or of the surface reflectance.

In order to separate surface orientation from reflectance, our method tracks texture changes over time and exploits surface reflectance’s temporal constancy. This additional constraint allows us to reformulate the problem as an energy functional minimisation, solved by a standard quasi-Newton method. Our method is tested both on real and synthetic data, quantitatively evaluated and compared to a state-of-the-art method.

Zsolt Jankó, Amaël Delaunoy, Emmanuel Prados
Multi-Target Tracking by Learning Class-Specific and Instance-Specific Cues

This paper proposes a novel particle filtering framework for multi-target tracking by using online learned class-specific and instance-specific

cues

, called Data-Driven Particle Filtering (DDPF). The learned

cues

include an online learned geometrical model for excluding detection outliers that violate geometrical constraints, global pose estimators shared by all targets for particle refinement, and online Boosting based appearance models which select discriminative features to distinguish different individuals. Targets are clustered into two categories. Separated-target is tracked by an ISPF (incremental self-tuning particle filtering) tracker, in which particles are incrementally drawn and tuned to their best states by a learned global pose estimator; target-group is tracked by a joint-state particle filtering method in which occlusion reasoning is conducted. Experimental results on challenging datasets show the effectiveness and efficiency of the proposed method.

Min Li, Wei Chen, Kaiqi Huang, Tieniu Tan
Modeling Complex Scenes for Accurate Moving Objects Segmentation

In video surveillance, it is still a difficult task to segment moving object accurately in complex scenes, since most widely used algorithms are background subtraction. We propose an online and unsupervised technique to find optimal segmentation in a Markov Random Field (MRF) framework. To improve the accuracy, color, locality, temporal coherence and spatial consistency are fused together in the framework. The models of color, locality and temporal coherence are learned online from complex scenes. A novel mixture of nonparametric regional model and parametric pixel-wise model is proposed to approximate the background color distribution. The foreground color distribution for every pixel is learned from neighboring pixels of previous frame. The locality distributions of background and foreground are approximated with the nonparametric model. The temporal coherence is modeled with a Markov chain. Experiments on challenging videos demonstrate the effectiveness of our algorithm.

Jianwei Ding, Min Li, Kaiqi Huang, Tieniu Tan
Online Learning for PLSA-Based Visual Recognition

Probabilistic Latent Semantic Analysis (PLSA) is one of the latent topic models and it has been successfully applied to visual recognition tasks. However, PLSA models have been learned mainly in batch learning, which can not handle data that arrives sequentially. In this paper, we propose a novel on-line learning algorithm for learning the parameters of PLSA. Our contributions are two-fold: (i) an on-line learning algorithm that learns the parameters of a PLSA model from incoming data; (ii) a codebook adaptation algorithm that can capture the full characteristics of all the features during the learning. Experimental results demonstrate that the proposed algorithm can handle sequentially arriving data that batch PLSA learning cannot cope with, and its performance is comparable with that of the batch PLSA learning on visual recognition.

Jie Xu, Getian Ye, Yang Wang, Wei Wang, Jun Yang
Emphasizing 3D Structure Visually Using Coded Projection from Multiple Projectors

In this paper, we propose a method for emphasizing 3D structure of the scene visually by blending patterned lights projected from multiple projectors. The proposed method enables us to emphasize specific 3D structure of the scene without capturing image of the scene and without estimating 3D structure of the scene. As a result, the 3D structure can be emphasized visually without any delay of computation. In this paper, we propose a method for generating the patterned light of projectors which enables us to emphasize arbitrary 3D structure of the scene efficiently.

Ryo Nakamura, Fumihiko Sakaue, Jun Sato
Object Class Segmentation Using Reliable Regions

Image segmentation is increasingly used for object recognition. The advantages of segments are numerous: a natural spatial support to compute features, reduction in the number of hypothesis to test, region shape itself can be a useful feature, etc. Since segmentation is brittle, a popular remedy is to integrate results over multiple segmentations of the scene. In previous work, usually all the regions in multiple segmentations are used. However, a typical segmentation algorithm often produces generic regions lacking discriminating features. In this work we explore the idea of finding and using only the regions that are reliable for detection. The main step is to cluster feature vectors extracted from regions and deem as unreliable any clusters that belong to different classes but have a significant overlap. We use a simple nearest neighbor classifier for object class segmentation and show that discarding unreliable regions results in a significant improvement.

Vida Vakili, Olga Veksler
Specular Surface Recovery from Reflections of a Planar Pattern Undergoing an Unknown Pure Translation

This paper addresses the problem of specular surface recovery, and proposes a novel solution based on observing the reflections of a translating planar pattern. Previous works have demonstrated that a specular surface can be recovered from the reflections of two calibrated planar patterns. In this paper, however, only one reference planar pattern is assumed to have been calibrated against a fixed camera observing the specular surface. Instead of introducing and calibrating a second pattern, the reference pattern is allowed to undergo an unknown pure translation, and a closed form solution is derived for recovering such a motion. Unlike previous methods which estimate the shape by directly triangulating the visual rays and reflection rays, a novel method based on computing the projections of the visual rays on the translating pattern is introduced. This produces a depth range for each pixel which also provides a measure of the accuracy of the estimation. The proposed approach enables a simple auto-calibration of the translating pattern, and data redundancy resulting from the translating pattern can improve both the robustness and accuracy of the shape estimation. Experimental results on both synthetic and real data are presented to demonstrate the effectiveness of the proposed approach.

Miaomiao Liu, Kwan-Yee K. Wong, Zhenwen Dai, Zhihu Chen
Medical Image Segmentation Based on Novel Local Order Energy

Image segmentation plays an important role in many medical imaging systems, yet in complex circumstances it is still a challenging problem. Among many difficulties, problem caused by the image intensity inhomogeneity is the key aspect. In this work, we develop a novel local-homogeneous region-based level set segmentation method to tackle this problem. First, we propose a novel

local order energy

, which interprets the local intensity constraint. And then, we integrate this energy into the objective energy function. After that, we minimize the energy function via a level set evolution process. Extensive experiments are performed to evaluate the proposed approach, showing significant improvements in both accuracy and efficiency, as compared to the state-of-the-art.

LingFeng Wang, Zeyun Yu, ChunHong Pan
Geometries on Spaces of Treelike Shapes

In order to develop statistical methods for shapes with a tree-structure, we construct a shape space framework for treelike shapes and study metrics on the shape space. The shape space has singularities, which correspond to topological transitions in the represented trees. We study two closely related metrics, TED and QED. The QED is a quotient euclidean distance arising from the new shape space formulation, while TED is essentially the classical tree edit distance. Using Gromov’s metric geometry we gain new insight into the geometries defined by TED and QED. In particular, we show that the new metric QED has nice geometric properties which facilitate statistical analysis, such as existence and local uniqueness of geodesics and averages. TED, on the other hand, has algorithmic advantages, while it does not share the geometric strongpoints of QED. We provide a theoretical framework as well as computational results such as matching of airway trees from pulmonary CT scans and geodesics between synthetic data trees illustrating the dynamic and geometric properties of the QED metric.

Aasa Feragen, Francois Lauze, Pechin Lo, Marleen de Bruijne, Mads Nielsen
Human Pose Estimation Using Exemplars and Part Based Refinement

In this paper, we proposed a fast and accurate human pose estimation framework that combines top-down and bottom-up methods. The framework consists of an initialization stage and an iterative searching stage. In the initialization stage, example based method is used to find several initial poses which are used as searching seeds of the next stage. In the iterative searching stage, a larger number of body parts candidates are generated by adding random disturbance to searching seeds. Belief Propagation (BP) algorithm is applied to these candidates to find the best n poses using the information of global graph model and part image likelihood. Then these poses are further used as searching seeds for the next iteration. To model image likelihoods of parts we designed rotation invariant EdgeField features based on which we learnt boosted classifiers to calculate the image likelihoods. Experiment result shows that our framework is both fast and accurate.

Yanchao Su, Haizhou Ai, Takayoshi Yamashita, Shihong Lao
Full-Resolution Depth Map Estimation from an Aliased Plenoptic Light Field

In this paper we show how to obtain full-resolution depth maps from a single image obtained from a plenoptic camera. Previous work showed that the estimation of a low-resolution depth map with a plenoptic camera differs substantially from that of a camera array and, in particular, requires appropriate depth-varying antialiasing filtering. In this paper we show a quite striking result: One can instead recover a depth map at the same full-resolution of the input data. We propose a novel algorithm which exploits a photoconsistency constraint specific to light fields captured with plenoptic cameras. Key to our approach is handling missing data in the photoconsistency constraint and the introduction of novel boundary conditions that impose texture consistency in the reconstructed full-resolution images. These ideas are combined with an efficient regularization scheme to give depth maps at a higher resolution than in any previous method. We provide results on both synthetic and real data.

Tom E. Bishop, Paolo Favaro
Indoor Scene Classification Using Combined 3D and Gist Features

Scene categorization is an important mechanism for providing high-level context which can guide methods for a more detailed analysis of scenes. State-of-the-art techniques like Torralba’s Gist features show a good performance on categorizing outdoor scenes but have problems in categorizing indoor scenes. In contrast to object based approaches, we propose a 3D feature vector capturing general properties of the spatial layout of indoor scenes like shape and size of extracted planar patches and their orientation to each other. This idea is supported by psychological experiments which give evidence for the special role of 3D geometry in categorizing indoor scenes. In order to study the influence of the 3D geometry we introduce in this paper a novel 3D indoor database and a method for defining 3D features on planar surfaces extracted in 3D data. Additionally, we propose a voting technique to fuse 3D features and 2D Gist features and show in our experiments a significant contribution of the 3D features to the indoor scene categorization task.

Agnes Swadzba, Sven Wachsmuth
Closed-Form Solutions to Minimal Absolute Pose Problems with Known Vertical Direction

In this paper we provide new simple closed-form solutions to two minimal absolute pose problems for the case of known vertical direction. In the first problem we estimate absolute pose of a calibrated camera from two 2D-3D correspondences and a given vertical direction. In the second problem we assume camera with unknown focal length and radial distortion and estimate its pose together with the focal length and the radial distortion from three 2D-3D correspondences and a given vertical direction. The vertical direction can be obtained either by direct physical measurement by, e.g., gyroscopes and inertial measurement units or from vanishing points constructed in images. Both our problems result in solving one polynomial equation of degree two in one variable and one, respectively two, systems of linear equations and can be efficiently solved in a closed-form. By evaluating our algorithms on synthetic and real data we demonstrate that both our solutions are fast, efficient and numerically stabled.

Zuzana Kukelova, Martin Bujnak, Tomas Pajdla
Level Set with Embedded Conditional Random Fields and Shape Priors for Segmentation of Overlapping Objects

Traditional methods for segmenting touching or overlapping objects may lead to the loss of accurate shape information which is a key descriptor in many image analysis applications. While experimental results have shown the effectiveness of using statistical shape priors to overcome such difficulties in a level set based variational framework, problems in estimation of parameters that balance evolution forces from image information and shape priors remain unsolved. In this paper, we extend the work of embedded Conditional Random Fields (CRF) by incorporating shape priors so that accurate estimation of those parameters can be obtained by the supervised training of the discrete CRF. In addition, non-parametric kernel density estimation with adaptive window size is applied as a statistical measure that locally approximates the variation of intensities to address intensity inhomogeneities. The model is tested for the problem of segmenting overlapping nuclei in cytological images.

Xuqing Wu, Shishir K. Shah
Optimal Two-View Planar Scene Triangulation

We present a new algorithm for optimally computing from point correspondences over two images their 3-D positions when they are constrained to be on a planar surface. We consider two cases: the case in which the plane and camera parameters are known and the case in which they are not. In the former, we show how observed point correspondences are optimally corrected so that they are compatible with the homography between the two images determined by the plane and camera parameters. In the latter, we show how the homography is optimally estimated by iteratively using the triangulation procedure.

Kenichi Kanatani, Hirotaka Niitsuma
Pursuing Atomic Video Words by Information Projection

In this paper, we study mathematical models of atomic visual patterns from natural videos and establish a generative visual vocabulary for video representation. Empirically, we employ small video patches (e.g., 15×15×5, called video “bricks”) in natural videos as basic analysis unit. There are a variety of brick subspaces (or atomic video words) of varying dimensions in the high dimensional brick space. The structures of the words are characterized by both appearance and motion dynamics. Here, we categorize the words into two pure types:

structural video words (SVWs)

and

textural video words (TVWs)

. A common generative model is introduced to model these two type video words in a unified form. The representation power of a word is measured by its information gain, based on which words are pursued one by one via a novel pursuit algorithm, and finally a holistic video vocabulary is built up. Experimental results show the potential power of our framework for video representation.

Youdong Zhao, Haifeng Gong, Yunde Jia
A Direct Method for Estimating Planar Projective Transform

Estimating planar projective transform (homography) from a pair of images is a classical problem in computer vision. In this paper, we propose a novel algorithm for direct registering two point sets in

$\mathbb R^2$

using projective transform without using intensity values. In this very general context, there is no easily established correspondences that can be used to estimate the projective transform, and most of the existing techniques become either inadequate or inappropriate. While the planar projective transforms form an eight-dimensional Lie group, we show that for registering 2D point sets, the search space for the homographies can be effectively reduced to a three-dimensional space. To further improve on the running time without significantly reducing the accuracy of the registration, we propose a matching cost function constructed using local polynomial moments of the point sets and a coarse to fine approach. The resulting registration algorithm has linear time complexity with respect to the number of input points. We have validated the algorithm using

points sets

collected from real images. Preliminary experimental results are encouraging and they show that the proposed method is both efficient and accurate.

Yu-Tseh Chi, Jeffrey Ho, Ming-Hsuan Yang
Spatial-Temporal Motion Compensation Based Video Super Resolution

Due to the arbitrary motion patterns in practical video, annoying artifacts cased by the registration error often appears in the super resolution outcome. This paper proposes a spatial-temporal motion compensation based super resolution fusion method (STMC) for video after explicit motion estimation between a few neighboring frames. We first register the neighboring low resolution frames to proper positions in the high resolution frame, and then use the registered low resolution information as non-local redundancy to compensate the surrounding positions which have no or a few registered low resolution pixels. Experimental results indicate the proposed method can effectively reduce the artifacts cased by the motion estimation error with obvious performance improvement in both PSNR and visual effect.

Yaozu An, Yao Lu, Ziye Yan
Learning Rare Behaviours

We present a novel approach to detect and classify rare behaviours which are visually subtle and occur sparsely in the presence of overwhelming typical behaviours. We treat this as a weakly supervised classification problem and propose a novel topic model: Multi-Class Delta Latent Dirichlet Allocation which learns to model rare behaviours from a few weakly labelled videos as well as typical behaviours from uninteresting videos by collaboratively sharing features among all classes of footage. The learned model is able to accurately classify unseen data. We further explore a novel method for detecting unknown rare behaviours in unseen data by synthesising new plausible topics to hypothesise any potential behavioural conflicts. Extensive validation using both simulated and real-world CCTV video data demonstrates the superior performance of the proposed framework compared to conventional unsupervised detection and supervised classification approaches.

Jian Li, Timothy M. Hospedales, Shaogang Gong, Tao Xiang
Character Energy and Link Energy-Based Text Extraction in Scene Images

Extracting text objects from scene images is a challenging problem. In this paper, by investigating the properties of single characters and text objects, we propose a new text extraction approach for scene images. First, character energy is computed based on the similarity of stroke edges to detect candidate character regions, then link energy is calculated based on the spatial relationship and similarity between neighboring candidate character regions to group characters and eliminate false positives. We applied the approach on ICDAR dataset 2003. The experimental results demonstrate the validity of our method.

Jing Zhang, Rangachar Kasturi
A Novel Representation of Palm-Print for Recognition

This paper proposes a novel palm-print feature extraction technique which is based on binarising the difference of Discrete Cosine Transform coefficients of overlapping circular strips. The binary features of palm-print are matched using Hamming distance. The system is evaluated using PolyU database consisting of 7,752 images. A procedure to extract palm-print for PolyU dataset is proposed and found to extract larger area compared to preprocessing technique in [1]. Variation in brightness of the extracted palm-print is corrected and the contrast of its texture is enhanced. Compared to the systems in [1, 2], the proposed system achieves higher Correct Recognition Rate (CRR) of 100 % with lower Equal Error Rate (EER) of 0.0073% at low computational cost.

G. S. Badrinath, Phalguni Gupta
Real-Time Robust Image Feature Description and Matching

The problem of finding corresponding points between images of the same scene is at the heart of many computer vision problems. In this paper we present a real-time approach to finding correspondences under changes in scale, rotation, viewpoint and illumination using Simple Circular Accelerated Robust Features (SCARF). Prominent descriptors such as SIFT and SURF find robust correspondences, but at a computation cost that limits the number of points that can be handled on low-memory, low-power devices. Like SURF, SCARF is based on Haar wavelets. However, SCARF employs a novel non-uniform sampling distribution, structure, and matching technique that provides computation times comparable to the state-of-the-art without compromising distinctiveness and robustness. Computing 512 SCARF descriptors takes 12.6ms on a 2.4GHz processor, and each descriptor occupies just 60 bytes. Therefore the descriptor is ideal for real-time applications which are implemented on low-memory, low-power devices.

Stephen J. Thomas, Bruce A. MacDonald, Karl A. Stol
A Biologically-Inspired Theory for Non-axiomatic Parametric Curve Completion

Visual curve completion is typically handled in an axiomatic fashion where the shape of the sought-after completed curve follows formal descriptions of desired, image-based perceptual properties (e.g, minimum curvature, roundedness, etc...). Unfortunately, however, these desired properties are still a matter of debate in the perceptual literature. Instead of the image plane, here we study the problem in the mathematical space

${\mathbf R}^{2}\times {\mathcal S}^{1}$

that abstracts the cortical areas where curve completion occurs. In this space one can apply basic principles from which perceptual properties in the image plane are

derived

rather than

imposed

. In particular, we show how a “least action” principle in

${\mathbf R}^{2}\times {\mathcal S}^{1}$

entails many perceptual properties which have support in the perceptual curve completion literature. We formalize this principle in a variational framework for general parametric curves, we derive its differential properties, we present numerical solutions, and we show results on a variety of images.

Guy Ben-Yosef, Ohad Ben-Shahar
Geotagged Image Recognition by Combining Three Different Kinds of Geolocation Features

Scenes and objects represented in photos have causal relationship to the places where they are taken. In this paper, we propose using geo-information such as aerial photos and location-related texts as features for geotagged image recognition and fusing them with Multiple Kernel Learning (MKL). By the experiments, we have verified the possibility for reflecting location contexts in image recognition by evaluating not only recognition rates, but feature fusion weights estimated by MKL. As a result, the mean average precision (MAP) for 28 categories increased up to 80.87% by the proposed method, compared with 77.71% by the baseline. Especially, for the categories related to location-dependent concepts, MAP was improved by 6.57 points.

Keita Yaegashi, Keiji Yanai
Modeling Urban Scenes in the Spatial-Temporal Space

This paper presents a technique to simultaneously model 3D urban scenes in the spatial-temporal space using a collection of photos that span many years. We propose to use a middle level representation,

building

, to characterize significant structure changes in the scene. We first use structure-from-motion techniques to build 3D point clouds, which is a mixture of scenes from different periods of time. We then segment the point clouds into independent buildings using a hierarchical method, including coarse clustering on sparse points and fine classification on dense points based on the spatial distance of point clouds and the difference of visibility vectors. In the fine classification, we segment building candidates using a probabilistic model in the spatial-temporal space simultaneously. We employ a z-buffering based method to infer existence of each building in each image. After recovering temporal order of input images, we finally obtain 3D models of these buildings along the time axis. We present experiments using both toy building images captured from our lab and real urban scene images to demonstrate the feasibility of the proposed approach.

Jiong Xu, Qing Wang, Jie Yang
Family Facial Patch Resemblance Extraction

Family members have close facial resemblances to one another; especially for certain specific parts of the face but the resemblance part differ from family to family. However, we have no problem in identifying such facial resemblances to guess the family relationships. This paper attempts to develop such human capability in computers through measurements of the resemblance of each facial patch to classify family members. To achieve this goal, family datasets are collected. A modified Golden Ratio Mask is implemented to guide the facial patches. Features of each facial patch are selected, analyzed by an individual classifier and the importance of each patch is extracted to find the set of most informative patches. To evaluate the performance, various scenarios where different members of the family are absent from training but present in testing are tested to classify the family members. Results obtained show that we can achieve up to 98% average accuracy on the collected dataset.

M. Ghahramani, W. Y. Yau, E. K. Teoh
3D Line Segment Detection for Unorganized Point Clouds from Multi-view Stereo

This paper presents a fast and reliable approach for detecting 3D line segment on the unorganized point clouds from multi-view stereo. The core idea is to discover weak matching of line segments by re-projecting 3D point to 2D image plane and infer 3D line segment by spatial constraints. On the basis of 2D line segment detector and multi-view stereo, the proposed algorithm firstly re-projects the spatial point clouds into planar set on different camera matrices; then finds the best re-projection line from tentative matched points. Finally, 3D line segment is produced by back-projection after outlier removal. In order to remove the matching errors caused by re-projection, a plane clustering method is implemented. Experimental results show that the approach can obtain satisfactory 3D line detection visually as well as high computational efficiency. The proposed fast line detection can be extended in the application of 3D sketch for large-scale scenes from multiple images.

Tingwang Chen, Qing Wang
Multi-View Stereo Reconstruction with High Dynamic Range Texture

In traditional 3D model reconstruction, the texture information is captured in a certain dynamic range, which is usually insufficient for rendering under new environmental light. This paper proposes a novel approach for multi-view stereo (MVS) reconstruction of models with high dynamic range (HDR) texture. In the proposed approach, multi-view images are firstly taken with different exposure times simultaneously. Corresponding pixels in adjacent viewpoints are then extracted using a multi-projection method, to robustly recover the response function of the camera. With the response function, pixel values in the differently exposed images can be converted to the desired relative radiance values. Subsequently, geometry reconstruction and HDR texture recovering can be achieved using these values. Experimental results demonstrate that our method can recover the HDR texture for the 3D model efficiently while keep high geometry precision. With our reconstructed HDR texture model, high-quality scene re-lighting is exemplarily exhibited.

Feng Lu, Xiangyang Ji, Qionghai Dai, Guihua Er
Feature Quarrels: The Dempster-Shafer Evidence Theory for Image Segmentation Using a Variational Framework

Image segmentation is the process of partitioning an image into at least two regions. Usually, active contours or level set based image segmentation methods combine different feature channels, arising from the color distribution, texture or scale information, in an energy minimization approach. In this paper, we integrate the Dempster-Shafer evidence theory in level set based image segmentation to fuse the information (and resolve conflicts) arising from different feature channels. They are further combined with a smoothing term and applied to the signed distance function of an evolving contour. In several experiments we demonstrate the properties and advantages of using the Dempster-Shafer evidence theory in level set based image segmentation.

Björn Scheuermann, Bodo Rosenhahn
Gait Analysis of Gender and Age Using a Large-Scale Multi-view Gait Database

This paper describes video-based gait feature analysis for gender and age classification using a large-scale multi-view gait database. First, we constructed a large-scale multi-view gait database in terms of the number of subjects (168 people), the diversity of gender and age (88 males and 80 females between 4 and 75 years old), and the number of observed views (25 views) using a multi-view synchronous gait capturing system. Next, classification experiments with four classes, namely children, adult males, adult females, and the elderly were conducted to clarify view impact on classification performance. Finally, we analyzed the uniqueness of the gait features for each class for several typical views to acquire insight into gait differences among genders and age classes from a computer-vision point of view. In addition to insights consistent with previous works, we also obtained novel insights into view-dependent gait feature differences among gender and age classes as a result of the analysis.

Yasushi Makihara, Hidetoshi Mannami, Yasushi Yagi
A System for Colorectal Tumor Classification in Magnifying Endoscopic NBI Images

In this paper we propose a recognition system for classifying NBI images of colorectal tumors into three types (A, B, and C3) of structures of microvessels on the colorectal surface. These types have a strong correlation with histologic diagnosis:

hyperplasias

(HP),

tubular adenomas

(TA), and

carcinomas with massive submucosal invasion

(SM-m). Images are represented by Bag-of-features of the SIFT descriptors densely sampled on a grid, and then classified by an SVM with an RBF kernel. A dataset of 907 NBI images were used for experiments with 10-fold cross-validation, and recognition rate of 94.1% were obtained.

Toru Tamaki, Junki Yoshimuta, Takahishi Takeda, Bisser Raytchev, Kazufumi Kaneda, Shigeto Yoshida, Yoshito Takemura, Shinji Tanaka
A Linear Solution to 1-Dimensional Subspace Fitting under Incomplete Data

Computing a 1-dimensional linear subspace is an important problem in many computer vision algorithms. Its importance stems from the fact that maximizing a linear homogeneous equation system can be interpreted as subspace fitting problem. It is trivial to compute the solution if all coefficients of the equation system are known, yet for the case of incomplete data, only approximation methods based on variations of gradient descent have been developed.

In this work, an algorithm is presented in which the data is embedded in projective spaces. We prove that the intersection of these projective spaces is identical to the desired subspace. Whereas other algorithms approximate this subspace iteratively, computing the intersection of projective spaces defines a

linear

problem. This solution is therefore not an approximation but

exact

in the absence of noise. We derive an upper boundary on the number of missing entries the algorithm can handle. Experiments with synthetic data confirm that the proposed algorithm successfully fits subspaces to data even if more than 90% of the data is missing. We demonstrate an example application with real image sequences.

Hanno Ackermann, Bodo Rosenhahn
Efficient Clustering Earth Mover’s Distance

The two-class clustering problem is formulated as an integer convex optimisation problem which determines the maximum of the Earth Movers Distance (EMD) between two classes, constructing a bipartite graph with minimum flow and maximum inter-class EMD between two sets. Subsequently including the nearest neighbours of the start point in feature space and calculating the EMD for this labellings quickly converges to a robust optimum. A histogram of grey values with the number of bins

b

as the only parameter is used as feature, which makes run time complexity independent of the number of pixels. After convergence in

$\mathcal{O}(b)$

steps, spatial correlations can be taken into account by total variational smoothing. Testing the algorithm on real world images from commonly used databases reveals that it is competitive to state-of-the-art methods, while it deterministically yields hard assignments without requiring any a priori knowledge of the input data or similarity matrices to be calculated.

Jenny Wagner, Björn Ommer
One-Class Classification with Gaussian Processes

Detecting instances of unknown categories is an important task for a multitude of problems such as object recognition, event detection, and defect localization. This paper investigates the use of Gaussian process (GP) priors for this area of research. Focusing on the task of one-class classification for visual object recognition, we analyze different measures derived from GP regression and approximate GP classification. Experiments are performed using a large set of categories and different image kernel functions. Our findings show that the well-known Support Vector Data Description is significantly outperformed by at least two GP measures which indicates high potential of Gaussian processes for one-class classification.

Michael Kemmler, Erik Rodner, Joachim Denzler
A Fast Semi-inverse Approach to Detect and Remove the Haze from a Single Image

In this paper we introduce a novel approach to restore a single image degraded by atmospheric phenomena such as fog or haze. The presented algorithm allows for fast identification of hazy regions of an image, without making use of expensive optimization and refinement procedures. By applying a single per pixel operation on the original image, we produce a ’semi-inverse’ of the image. Based on the hue disparity between the original image and its semi-inverse, we are then able to identify hazy regions on a per pixel basis. This enables for a simple estimation of the airlight constant and the transmission map. Our approach is based on an extensive study on a large data set of images, and validated based on a metric that measures the contrast but also the structural changes. The algorithm is straightforward and performs faster than existing strategies while yielding comparative and even better results. We also provide a comparative evaluation against other recent single image dehazing methods, demonstrating the efficiency and utility of our approach.

Codruta O. Ancuti, Cosmin Ancuti, Chris Hermans, Philippe Bekaert
Salient Region Detection by Jointly Modeling Distinctness and Redundancy of Image Content

Salient region detection in images is a challenging task, despite its usefulness in many applications. By modeling an image as a collection of clusters, we design a unified clustering framework for salient region detection in this paper. In contrast to existing methods, this framework not only models content distinctness from the intrinsic properties of clusters, but also models content redundancy from the removed content during the retargeting process. The cluster saliency is initialized from both distinctness and redundancy and then propagated among different clusters by applying a clustering assumption between clusters and their saliency. The novel saliency propagation improves the robustness to clustering parameters as well as retargeting errors. The power of the proposed method is carefully verified on a standard dataset of 5000 real images with rectangle annotations as well as a subset with accurate contour annotations.

Yiqun Hu, Zhixiang Ren, Deepu Rajan, Liang-Tien Chia
Unsupervised Selective Transfer Learning for Object Recognition

We propose a novel unsupervised transfer learning framework that utilises unlabelled auxiliary data to quantify and select the most relevant transferrable knowledge for recognising a target object class from the background given very limited training target samples. Unlike existing transfer learning techniques, our method does not assume that auxiliary data are labelled, nor the relationships between target and auxiliary classes are known

a priori

. Our unsupervised transfer learning is formulated by a novel kernel adaptation transfer (KAT) learning framework, which aims to (a) extract general knowledge about how more structured objects are visually distinctive from cluttered background regardless object class, and (b) more importantly, perform selective transfer of knowledge extracted from the auxiliary data to minimise negative knowledge transfer suffered by existing methods. The effectiveness and efficiency of the proposed approach is demonstrated by performing one-class object recognition (object vs. background) task using the Caltech256 dataset.

Wei-Shi Zheng, Shaogang Gong, Tao Xiang
A Heuristic Deformable Pedestrian Detection Method

Pedestrian detection is an important application in computer vision. Currently, most pedestrian detection methods focus on learning one or multiple fixed models. These algorithms rely heavily on training data and do not perform well in handling various pedestrian deformations. To address this problem, we analyze the cause of pedestrian deformation and propose a method to adaptively describe the state of pedestrians’ parts. This is valuable to resolve the pedestrian deformation problem. Experimental results on the INRIA human dataset and our pedestrian pose database demonstrate the effectiveness of our method.

Yongzhen Huang, Kaiqi Huang, Tieniu Tan
Gradual Sampling and Mutual Information Maximisation for Markerless Motion Capture

The major issue in markerless motion capture is finding the global optimum from the multimodal setting where distinctive gestures may have similar likelihood values. Instead of only focusing on effective searching as many existing works, our approach resolves gesture ambiguity by designing a better-behaved observation likelihood. We extend Annealed Particle Filtering by a novel gradual sampling scheme that allows evaluations to concentrate on large mismatches of the tracking subject. Noticing the limitation of silhouettes in resolving gesture ambiguity, we incorporate appearance information in an illumination invariant way by maximising Mutual Information between an appearance model and the observation. This in turn strengthens the effectiveness of the better-behaved likelihood. Experiments on the benchmark datasets show that our tracking performance is comparable to or higher than the state-of-the-art studies, but with simpler setting and higher computational efficiency.

Yifan Lu, Lei Wang, Richard Hartley, Hongdong Li, Dan Xu
Temporal Feature Weighting for Prototype-Based Action Recognition

In action recognition recently prototype-based classification methods became popular. However, such methods, even showing competitive classification results, are often limited due to too simple and thus insufficient representations and require a long-term analysis. To compensate these problems we propose to use more sophisticated features and an efficient prototype-based representation allowing for a single-frame evaluation. In particular, we apply four feature cues in parallel (two for appearance and two for motion) and apply a hierarchical k-means tree, where the obtained leaf nodes represent the prototypes. In addition, to increase the classification power, we introduce a temporal weighting scheme for the different information cues. Thus, in contrast to existing methods, which typically use global weighting strategies (i.e., the same weights are applied for all data) the weights are estimated separately for a specific point in time. We demonstrate our approach on standard benchmark datasets showing excellent classification results. In particular, we give a detailed study on the applied features, the hierarchical tree representation, and the influence of temporal weighting as well as a competitive comparison to existing state-of-the-art methods.

Thomas Mauthner, Peter M. Roth, Horst Bischof
PTZ Camera Modeling and Panoramic View Generation via Focal Plane Mapping

We present a novel technique to accurately map the complete field-of-coverage of a camera to its pan-tilt space in an efficient manner. This camera model enables mapping the coordinates of any (

x

,

y

) point in the camera’s current image to that point’s corresponding orientation in the camera’s pan-tilt space. The model is based on the elliptical locus of the projections of a fixed point on the original focal plane of a moving camera. The parametric location of this point along the ellipse defines the change in camera orientation. The efficiency of the model lies in the fast and automatic mapping technique. We employ the proposed model to generate panoramas and evaluate the mapping procedure with multiple PTZ surveillance cameras.

Karthik Sankaranarayanan, James W. Davis
Horror Image Recognition Based on Emotional Attention

Along with the ever-growing Web, people benefit more and more from sharing information. Meanwhile, the harmful and illegal content, such as pornography, violence, horror etc., permeates the Web. Horror images, whose threat to children’s health is no less than that from pornographic content, are nowadays neglected by existing Web filtering tools. This paper focuses on horror image recognition, which may further be applied to Web horror content filtering. The contributions of this paper are two-fold. First, the emotional attention mechanism is introduced into our work to detect emotional salient region in an image. And a top-down emotional saliency computation model is initially proposed based on color emotion and color harmony theories. Second, we present an Attention based Bag-of-Words (ABoW) framework for image’s emotion representation by combining the emotional saliency computation model and the Bag-of-Words model. Based on ABoW, a horror image recognition algorithm is given out. The experimental results on diverse real images collected from internet show that the proposed emotional saliency model and horror image recognition algorithm are effective.

Bing Li, Weiming Hu, Weihua Xiong, Ou Wu, Wei Li
Spatial-Temporal Affinity Propagation for Feature Clustering with Application to Traffic Video Analysis

In this paper, we propose STAP (Spatial-Temporal Affinity Propagation), an extension of the Affinity Propagation algorithm for feature points clustering, by incorporating temporal consistency of the clustering configurations between consecutive frames. By extending AP to the temporal domain, STAP successfully models the smooth-motion assumption in object detection and tracking. Our experiments on applications in traffic video analysis demonstrate the effectiveness and efficiency of the proposed method and its advantages over existing approaches.

Jun Yang, Yang Wang, Arcot Sowmya, Jie Xu, Zhidong Li, Bang Zhang
Minimal Representations for Uncertainty and Estimation in Projective Spaces

Estimation using homogeneous entities has to cope with obstacles such as singularities of covariance matrices and redundant parametrizations which do not allow an immediate definition of maximum likelihood estimation and lead to estimation problems with more parameters than necessary. The paper proposes a representation of the uncertainty of all types of geometric entities and estimation procedures for geometric entities and transformations which (1) only require the minimum number of parameters, (2) are free of singularities, (3) allow for a consistent update within an iterative procedure, (4) enable to exploit the simplicity of homogeneous coordinates to represent geometric constraints and (5) allow to handle geometric entities which are at infinity or at least very far, avoiding the usage of concepts like the inverse depth. Such representations are already available for transformations such as rotations, motions (Rosenhahn 2002), homographies (Begelfor 2005), or the projective correlation with fundamental matrix (Bartoli 2004) all being elements of some Lie group. The uncertainty is represented in the tangent space of the manifold, namely the corresponding Lie algebra. However, to our knowledge no such representations are developed for the basic geometric entities such as points, lines and planes, as in addition to use the tangent space of the manifolds we need transformation of the entities such that they stay on their specific manifold during the estimation process. We develop the concept, discuss its usefulness for bundle adjustment and demonstrate (a) its superiority compared to more simple methods for vanishing point estimation, (b) its rigour when estimating 3D lines from 3D points and (c) its applicability for determining 3D lines from observed image line segments in a multi view setup.

Wolfgang Förstner
Personalized 3D-Aided 2D Facial Landmark Localization

Facial landmark detection in images obtained under varying acquisition conditions is a challenging problem. In this paper, we present a personalized landmark localization method that leverages information available from 2D/3D gallery data. To realize a robust correspondence between gallery and probe key points, we present several innovative solutions, including: (i) a hierarchical DAISY descriptor that encodes larger contextual information, (ii) a Data-Driven Sample Consensus (DDSAC) algorithm that leverages the image information to reduce the number of required iterations for robust transform estimation, and (iii) a 2D/3D gallery pre-processing step to build personalized landmark metadata (i.e., local descriptors and a 3D landmark model). We validate our approach on the Multi-PIE and UHDB14 databases, and by comparing our results with those obtained using two existing methods.

Zhihong Zeng, Tianhong Fang, Shishir K. Shah, Ioannis A. Kakadiaris
A Theoretical and Numerical Study of a Phase Field Higher-Order Active Contour Model of Directed Networks

We address the problem of quasi-automatic extraction of directed networks, which have characteristic geometric features, from images. To include the necessary prior knowledge about these geometric features, we use a phase field higher-order active contour model of directed networks. The model has a large number of unphysical parameters (weights of energy terms), and can favour different geometric structures for different parameter values. To overcome this problem, we perform a stability analysis of a long, straight bar in order to find parameter ranges that favour networks. The resulting constraints necessary to produce stable networks eliminate some parameters, replace others by physical parameters such as network branch width, and place lower and upper bounds on the values of the rest. We validate the theoretical analysis via numerical experiments, and then apply the model to the problem of hydrographic network extraction from multi-spectral VHR satellite images.

Aymen El Ghoul, Ian H. Jermyn, Josiane Zerubia
Sparse Coding on Local Spatial-Temporal Volumes for Human Action Recognition

By extracting local spatial-temporal features from videos, many recently proposed approaches for action recognition achieve promising performance. The Bag-of-Words (BoW) model is commonly used in the approaches to obtain the video level representations. However, BoW model roughly assigns each feature vector to its closest visual word, therefore inevitably causing nontrivial quantization errors and impairing further improvements on classification rates. To obtain a more accurate and discriminative representation, in this paper, we propose an approach for action recognition by encoding local 3D spatial-temporal gradient features within the sparse coding framework. In so doing, each local spatial-temporal feature is transformed to a linear combination of a few “atoms” in a trained dictionary. In addition, we also investigate the construction of the dictionary under the guidance of transfer learning. We collect a large set of diverse video clips of sport games and movies, from which a set of universal atoms composed of the dictionary are learned by an online learning strategy. We test our approach on KTH dataset and UCF sports dataset. Experimental results demonstrate that our approach outperforms the state-of-art techniques on KTH dataset and achieves the comparable performance on UCF sports dataset.

Yan Zhu, Xu Zhao, Yun Fu, Yuncai Liu
Multi-illumination Face Recognition from a Single Training Image per Person with Sparse Representation

In real-world face recognition systems, traditional face recognition algorithms often fail in the case of insufficient training samples. Recently, the face recognition algorithms of sparse representation have achieved promising results even in the presence of corruption or occlusion. However a large over-complete and elaborately designed discriminant training set is still required to form sparse representation, which seems impractical in the single training image per person problems. In this paper, we extend Sparse Representation Classification (SRC) to the one sample per person problem. We address this problem under variant lighting conditions by introducing relighting methods to generate virtual faces. Our diverse and complete training set can be well composed, which makes SRC more general. Moreover, we verify the recognition under different lighting environments by a cross-database comparison.

Die Hu, Li Song, Cheng Zhi
Human Detection in Video over Large Viewpoint Changes

In this paper, we aim to detect human in video over large viewpoint changes which is very challenging due to the diversity of human appearance and motion from a wide spread of viewpoint domain compared with a common frontal viewpoint. We propose 1) a new feature called Intra-frame and Inter-frame Comparison Feature to combine both appearance and motion information, 2) an Enhanced Multiple Clusters Boost algorithm to co-cluster the samples of various viewpoints and discriminative features automatically and 3) a Multiple Video Sampling strategy to make the approach robust to human motion and frame rate changes. Due to the large amount of samples and features, we propose a two-stage tree structure detector, using only appearance in the 1

st

stage and both appearance and motion in the 2

nd

stage. Our approach is evaluated on some challenging Real-world scenes, PETS2007 dataset, ETHZ dataset and our own collected videos, which demonstrate the effectiveness and efficiency of our approach.

Genquan Duan, Haizhou Ai, Shihong Lao
Adaptive Parameter Selection for Image Segmentation Based on Similarity Estimation of Multiple Segmenters

This paper addresses the parameter selection problem in image segmentation. Mostly, segmentation algorithms have parameters which are usually fixed beforehand by the user. Typically, however, each image has its own optimal set of parameters and in general a fixed parameter setting may result in unsatisfactory segmentations. In this paper we present a novel unsupervised framework for automatically choosing parameters based on a comparison with the results from some reference segmentation algorithm(s). The experimental results show that our framework is even superior to supervised selection method based on ground truth. The proposed framework is not bounded to image segmentation and can be potentially applied to solve the adaptive parameter selection problem in other contexts.

Lucas Franek, Xiaoyi Jiang
Cosine Similarity Metric Learning for Face Verification

Face verification is the task of deciding by analyzing face images, whether a person is who he/she claims to be. This is very challenging due to image variations in lighting, pose, facial expression, and age. The task boils down to computing the distance between two face vectors. As such, appropriate distance metrics are essential for face verification accuracy. In this paper we propose a new method, named the Cosine Similarity Metric Learning (CSML) for learning a distance metric for facial verification. The use of cosine similarity in our method leads to an effective learning algorithm which can improve the generalization ability of any given metric. Our method is tested on the state-of-the-art dataset, the Labeled Faces in the Wild (LFW), and has achieved the highest accuracy in the literature.

Hieu V. Nguyen, Li Bai
Backmatter
Metadata
Title
Computer Vision – ACCV 2010
Editors
Ron Kimmel
Reinhard Klette
Akihiro Sugimoto
Copyright Year
2011
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-19309-5
Print ISBN
978-3-642-19308-8
DOI
https://doi.org/10.1007/978-3-642-19309-5

Premium Partner