Skip to main content

Über dieses Buch

The four-volume set LNCS 6492-6495 constitutes the thoroughly refereed post-proceedings of the 10th Asian Conference on Computer Vision, ACCV 2009, held in Queenstown, New Zealand in November 2010. All together the four volumes present 206 revised papers selected from a total of 739 Submissions. All current issues in computer vision are addressed ranging from algorithms that attempt to automatically understand the content of images, optical methods coupled with computational techniques that enhance and improve images, and capturing and analyzing the world's geometry while preparing the higher level image and shape understanding. Novel gemometry techniques, statistical learning methods, and modern algebraic procedures are dealt with as well.



Posters on Day 2 of ACCV 2010

Approximate and SQP Two View Triangulation

The two view triangulation problem with Gaussian errors, aka optimal triangulation, has an optimal solution that requires finding the roots of a 6th degree polynomial. This is computationally quite demanding for a basic building block of many reconstruction algorithms. We consider two faster triangulation methods. The first is a closed form approximate solution that comes with intuitive and tight error bounds that also describe cases where the optimal method is needed. The second is an iterative method based on local sequential quadratic programming (SQP). In simulations, triangulation errors of the approximate method are on par with the optimal method in most cases of practical interest and the triangulation errors of the SQP method are on par with the optimal method in practically all cases. The SQP method is faster of the two and about two orders of magnitude faster than the optimal method.

Timo Tossavainen

Adaptive Motion Segmentation Algorithm Based on the Principal Angles Configuration

Many motion segmentation algorithms based on manifold clustering rely on a accurate rank estimation of the trajectory matrix and on a meaningful affinity measure between the estimated manifolds. While it is known that rank estimation is a difficult task, we also point out the problems that can be induced by an affinity measure that neglects the distribution of the principal angles. In this paper we suggest a new interpretation of the rank of the trajectory matrix and a new affinity measure. The rank estimation is performed by analysing which rank leads to a configuration where small and large angles are best separated. The affinity measure is a new function automatically parametrized so that it is able to adapt to the actual configuration of the principal angles. Our technique has one of lowest misclassification rates on the Hopkins155 database and has good performances also on synthetic sequences with up to 5 motions and variable noise level.

L. Zappella, E. Provenzi, X. Lladó, J. Salvi

Real-Time Detection of Small Surface Objects Using Weather Effects

Small surface objects, usually containing important information, are difficult to be identified under realistic atmospheric conditions because of weather degraded image features. This paper describes a novel algorithm to overcome the problem, using depth-aware analysis. Because objects-participating local patches always contain low intensities in at least one color channel, we detect suspicious small surface objects using the dark channel prior. Then, we estimate the approximate depth map of maritime scenes from a single image, based on the theory of perspective projection. Finally, using the estimated depth map and the atmospheric scattering model, we design spatial-variant thresholds to identify small surface objects from noisy backgrounds, without contrast enhancement. Experiments show that the proposed method has real-time implementation, and it can outperform the state-of-the-art algorithms on the detection of distant small surface objects with only a few pixels.

Baojun Qi, Tao Wu, Hangen He, Tingbo Hu

Automating Snakes for Multiple Objects Detection

Active contour or snake has emerged as an indispensable interactive image segmentation tool in many applications. However, snake fails to serve many significant image segmentation applications that require complete automation. Here, we present a novel technique to automate snake/active contour for multiple object detection. We first apply a probabilistic quad tree based approximate segmentation technique to find the regions of interest (ROI) in an image, evolve modifed GVF snakes within ROIs and finally classify the snakes into object and non-object classes using boosting. We propose a novel loss function for boosting that is more robust to outliers concerning snake classification and we derive a modified Adaboost algorithm by minimizing the proposed loss function to achieve better classification results. Extensive experiments have been carried out on two datasets: one has importance in oil sand mining industry and the other one is significant in bio-medical engineering. Performances of proposed snake validation have been compared with competitive methods. Results show that proposed algorithm is computationally less expensive and can delineate objects up to 30% more accurately as well as precisely.

Baidya Nath Saha, Nilanjan Ray, Hong Zhang

Monocular Template-Based Reconstruction of Smooth and Inextensible Surfaces

We present different approaches to reconstructing an inextensible surface from point correspondences between an input image and a template image representing a flat reference shape from a fronto-parallel point of view. We first propose a ‘point-wise’ method,


a method that only retrieves the 3D positions of the point correspondences. This method is formulated as a second-order cone program and it handles inaccuracies in the point measurements. It relies on the fact that the Euclidean distance between two 3D points must be shorter than their geodesic distance (which can easily be computed from the template image). We then present an approach that reconstructs a smooth 3D surface based on Free-Form Deformations. The surface is represented as a smooth map from the template image space to the 3D space. Our idea is to say that the 2D-3D map must be everywhere a local isometry. This induces conditions on the Jacobian matrix of the map which are included in a least-squares minimization problem.

Florent Brunet, Richard Hartley, Adrien Bartoli, Nassir Navab, Remy Malgouyres

Multi-class Leveraged κ-NN for Image Classification



-nearest neighbors (


-NN) classification rule is still an essential tool for computer vision applications, such as scene recognition. However,


-NN still features some major drawbacks, which mainly reside in the

uniform voting

among the nearest prototypes in the feature space.

In this paper, we propose a new method that is able to learn the “relevance” of


, thus classifying test data using a



-NN rule. In particular, our algorithm, called Multi-class Leveraged


-nearest neighbor (MLNN), learns the prototype weights in a


framework, by minimizing a


exponential risk over training data. We propose two main contributions for improving computational speed and accuracy. On the one hand, we implement learning in an inherently


way, thus providing significant computation time reduction over one-versus-all approaches. Furthermore, the leveraging weights enable effective data selection, thus reducing the cost of


-NN search at classification time. On the other hand, we propose a


generalization of our approach to take into account real-valued similarities between data in the feature space, thus enabling more accurate estimation of the local class density.

We tested


on three datasets of natural images. Results show that


significantly outperforms classic


-NN and weighted


-NN voting. Furthermore, using an adaptive Gaussian kernel provides significant performance improvement. Finally, the best results are obtained when using


with an appropriate learned metric distance.

Paolo Piro, Richard Nock, Frank Nielsen, Michel Barlaud

Video Based Face Recognition Using Graph Matching

In this paper, we propose a novel graph based approach for still-to-video based face recognition, in which the temporal and spatial information of the face from each frame of the video is utilized. The spatial information is incorporated using a graph based face representation. The graphs contain information on the appearance and geometry of facial feature points and are labeled using the feature descriptors of the feature points. The temporal information is captured using an adaptive probabilistic appearance model. The recognition is performed in two stages where in the first stage a Maximum a Posteriori solution based on PCA is computed to prune the search space and select fewer candidates. A simple deterministic algorithm which exploits the topology of the graph is used for matching in the second stage. The experimental results on the UTD database and our dataset show that the adaptive matching and the graph based representation provides robust performance in recognition.

Gayathri Mahalingam, Chandra Kambhamettu

A Hybrid Supervised-Unsupervised Vocabulary Generation Algorithm for Visual Concept Recognition

Vocabulary generation is the essential step in the bag-of-words image representation for visual concept recognition, because its quality affects classification performance substantially. In this paper, we propose a hybrid method for visual word generation which combines unsupervised density-based clustering with the discriminative power of fast support vector machines. We aim at three goals: breaking the vocabulary generation algorithm up into two sections, with one highly parallelizable part, reducing computation times for bag of words features and keeping concept recognition performance at levels comparable to vanilla k-means clustering. On the two recent data sets Pascal VOC2009 and Image-CLEF2010 PhotoAnnotation, our proposed method either outperforms various baseline algorithms for visual word generation with almost same computation time or reduces training/test time with on par classification performance.

Alexander Binder, Wojciech Wojcikiewicz, Christina Müller, Motoaki Kawanabe

Image Inpainting Based on Probabilistic Structure Estimation

A novel inpainting method based on probabilistic structure estimation has been developed. The method consists of two steps. First, an initial image, which captures rough structure and colors in the missing region, is estimated. This image is generated by probabilistically interpolating the gradient inside the missing region, and then by flooding the colors on the boundary into the missing region using Markov Random Field. Second, by locally replacing the missing region with local patches similar to both the adjacent patches and the initial image, the inpainted image is synthesized. Since the patch replacement process is guided by the initial image, the inpainted image is guaranteed to preserve the underlying structure. This also enables patches to be replaced in a greedy manner, i.e. without optimization. Experiments show the proposed method outperforms previous methods in terms of both subjective image quality and computational speed.

Takashi Shibata, Akihiko Iketani, Shuji Senda

Text Localization and Recognition in Complex Scenes Using Local Features

We describe an approach using local features to resolve problems in text localization and recognition in complex scenes. Low image quality, complex background and variations of text make these problems challenging. Our approach includes the following stages: (1) Template images are generated automatically; (2) SIFT features are extracted and matched to template images; (3) Multiple single-character-areas are located using segmentation algorithm based upon multiple-size sliding sub-windows; (4) An voting and geometric verification algorithm is used to identify final results. This framework thus is essentially simple by skipping many steps, such as normalization, binarization and OCR, which are required in previous methods. Moreover, this framework is robust as only SIFT feature is used. We evaluated our method using 200,000+ images in 3 scripts (Chinese, Japanese and Korean). We obtained average single-character success rate of 77.3% (highest 94.1%), average multiple-character success rate of 63.9% (highest 89.6%).

Qi Zheng, Kai Chen, Yi Zhou, Congcong Gu, Haibing Guan

Pyramid-Based Multi-structure Local Binary Pattern for Texture Classification

Recently, the local binary pattern (LBP) has been widely used in texture classification. The conventional LBP methods only describe micro structures of texture images, such as edges, corners, spots and so on, although many of them show a good performance on texture classification. This situation still could not be changed, even though the multiresolution analysis technique is used in methods of local binary pattern. In this paper, we investigate the drawback of conventional LBP operators in describing some textures that has the same small structures but differential large structures. And a multi-structure local binary pattern operator is achieved by executing the LBP method on different layers of image pyramid. The proposed method is simple yet efficient to extract not only the micro structures but also the macro structures of texture images. We demonstrate the performance of our method on the task of rotation invariant texture classification. The experimental results on Outex database show advantages of the proposed method.

Yonggang He, Nong Sang, Changxin Gao

Unsupervised Moving Object Detection with On-line Generalized Hough Transform

Generalized Hough Transform-based methods have been successfully applied to object detection. Such methods have the following disadvantages: (i) manual labeling of training data ; (ii) the off-line construction of codebook. To overcome these limitations, we propose an unsupervised moving object detection algorithm with on-line Generalized Hough Transform. Our contributions are two-fold: (i) an unsupervised training data selection algorithm based on Multiple Instance Learning (MIL); (ii) an on-line Extremely Randomized Trees construction algorithm for on-line codebook adaptation. We evaluate the proposed algorithm on three video datasets. The experimental results show that the proposed algorithm achieves comparable performance to the supervised detection method with manual labeling. They also show that the proposed algorithm outperforms the previously proposed unsupervised learning algorithm.

Jie Xu, Yang Wang, Wei Wang, Jun Yang, Zhidong Li

Interactive Event Search through Transfer Learning

Activity videos are widespread on the Internet but current video search is limited to text tags due to limitations in recognition systems. One of the main reasons for this limitation is the wide variety of activities users could query. Thus codifying knowledge for all queries becomes problematic. Relevance Feedback (RF) is a retrieval framework that addresses this issue via interactive feedback with the user during the search session. An added benefit is that RF can also learn the subjective component of a user’s search preferences. However for good retrieval performance, RF may require a large amount of user feedback for activity search. We address this issue by introducing Transfer Learning (TL) into RF. With TL, we can use auxiliary data from known classification problems different from the user’s target query to decrease the needed amount of user feedback. We address key issues in integrating RF and TL and demonstrate improved performance on the challenging YouTube Action Dataset.

Antony Lam, Amit K. Roy-Chowdhury, Christian R. Shelton

A Compositional Exemplar-Based Model for Hair Segmentation

Hair is a very important part of human appearance. Robust and accurate hair segmentation is difficult because of challenging variation of hair color and shape. In this paper, we propose a novel Compositional Exemplar-based Model (CEM) for hair style segmentation. CEM generates an adaptive hair style (a probabilistic mask) for the input image automatically in the manner of Divide-and-Conquer, which can be divided into decomposition stage and composition stage naturally. For the decomposition stage, we learn a strong ranker based on a group of weak similarity functions emphasizing the

Semantic Layout similarity

(SLS) effectively; in the composition stage, we introduce the

Neighbor Label Consistency

(NLC) Constraint to reduce the ambiguity between data representation and semantic meaning and then recompose the hair style using alpha-expansion algorithm. Final segmentation result is obtained by Dual-Level Conditional Random Fields. Experiment results on face images from Labeled Faces in the Wild data set show its effectiveness.

Nan Wang, Haizhou Ai, Shihong Lao

Descriptor Learning Based on Fisher Separation Criterion for Texture Classification

This paper proposes a novel method to deal with the representation issue in texture classification. A learning framework of image descriptor is designed based on the Fisher separation criteria (FSC) to learn most reliable and robust dominant pattern types considering intra-class similarity and inter-class distance. Image structures are thus be described by a new FSC-based learning (FBL) encoding method. Unlike previous handcraft-design encoding methods, such as the LBP and SIFT, supervised learning approach is used to learn an encoder from training samples. We find that such a learning technique can largely improve the discriminative ability and automatically achieve a good tradeoff between discriminative power and efficiency. The commonly used texture descriptor: local binary pattern (LBP) is taken as an example in the paper, so that we then proposed the FBL-LBP descriptor. We benchmark its performance by classifying textures present in the Outex_TC_0012 database for rotation invariant texture classification, KTH-TIPS2 database for material categorization and Columbia-Utrecht (CUReT) database for classification under different views and illuminations. The promising results verify its robustness to image rotation, illumination changes and noise. Furthermore, to validate the generalization to other problems, we extend the application also to face recognition and evaluate the proposed FBL descriptor on the FERET face database. The inspiring results show that this descriptor is highly discriminative.

Yimo Guo, Guoying Zhao, Matti Pietikäinen, Zhengguang Xu

Semi-supervised Neighborhood Preserving Discriminant Embedding: A Semi-supervised Subspace Learning Algorithm

Over the last decade, supervised and unsupervised subspace learning methods, such as LDA and NPE, have been applied for face recognition. In real life applications, besides unlabeled image data, prior knowledge in the form of labeled data is also available, and can be incorporated in subspace learning algorithm resulting in improved performance. In this paper, we propose a subspace learning method based on semi-supervised neighborhood preserving discriminant learning, which we call Semi-supervised Neighborhood Preserving Discriminant Embedding (SNPDE). The method preserves the local neighborhood structure of face manifold using NPE, and maximizes the separability of different classes using LDA. Experimental results on two face databases demonstrate the effectiveness of the proposed method.

Maryam Mehdizadeh, Cara MacNish, R. Nazim Khan, Mohammed Bennamoun

Segmentation via NCuts and Lossy Minimum Description Length: A Unified Approach

We investigate a fundamental problem in computer vision: unsupervised image segmentation. During the last decade, the Normalized Cuts has become very popular for image segmentation. NCuts guarantees a globally optimal solution in the continuous solution space, however, how to automatically select the number of segments for a given image is left as an open problem. Recently, the lossy minimum description length (LMDL) criterion has been proposed for segmentation of images. This criterion can adaptively determine the number of segments, however, as the optimization is combinatorial, only a suboptimal solution can be achieved by a greedy algorithm. The complementarity of both criteria motivates us to combine NCuts and LMDL into a unified fashion, to achieve a better segmentation: given the NCuts segmentations under different numbers of segments, we choose the optimal segmentation to be the one that minimizes the overall coding length, subject to a given distortion. We then develop a new way to use the coding length decrement as the similarity measure for NCuts, so that our algorithm is able to seek both the optimal NCuts solution under fixed number of segments, and the optimal LMDL solution among different numbers of segments. Extensive experiments demonstrate the effectiveness of our algorithm.

Mingyang Jiang, Chunxiao Li, Jufu Feng, Liwei Wang

A Phase Discrepancy Analysis of Object Motion

Detecting moving objects against dynamic backgrounds remains a challenge in computer vision and robotics. This paper presents a surprisingly simple algorithm to detect objects in such conditions. Based on theoretic analysis, we show that 1) the displacement of the foreground and the background can be represented by the phase change of Fourier spectra, and 2) the motion of background objects can be extracted by

Phase Discrepancy

in an efficient and robust way. The algorithm does not rely on prior training on particular features or categories of an image and can be implemented in 9 lines of MATLAB code.

In addition to the algorithm, we provide a new database for moving object detection with 20 video clips, 11 subjects and 4785 bounding boxes to be used as a public benchmark for algorithm evaluation.

Bolei Zhou, Xiaodi Hou, Liqing Zhang

Image Classification Using Spatial Pyramid Coding and Visual Word Reweighting

The ignorance on spatial information and semantics of visual words becomes main obstacles in the bag-of-visual-words (BoW) method for image classification. To address the obstacles, we present an improved BoW representation using spatial pyramid coding (SPC) and visual word reweighting. In SPC procedure, we adopt the sparse coding technique to encode visual features with the spatial constraint. Visual features from the same spatial sub-region of images are collected to generate the visual vocabulary. Additionally, a relaxed but simple solution for semantic embedding into visual words is proposed. We relax the semantic embedding from ideal semantic correspondence to naive semantic purity of visual words, and reweight each visual word according to its semantic purity. Higher weights are given to semantically distinctive visual words, and lower weights to semantically general ones. Experiments on a public dataset demonstrate the effectiveness of the proposed method.

Chunjie Zhang, Jing Liu, Jinqiao Wang, Qi Tian, Changsheng Xu, Hanqing Lu, Songde Ma

Class-Specific Low-Dimensional Representation of Local Features for Viewpoint Invariant Object Recognition

In this paper we propose a new general framework to obtain more distinctive local invariant features by projecting the original feature descriptors into low–dimensional feature space, while simultaneously incorporating also class information. In the resulting feature space, the features from different objects project to separate areas, while locally the metric relations between features corresponding to the same object are preserved. The low–dimensional feature embedding is obtained by a modified version of classical Multidimensional Scaling, which we call supervised Multidimensional Scaling (sMDS). Experimental results on a database containing images of several different objects with large variation in scale, viewpoint, illumination conditions and background clutter support the view that embedding class information into the feature representation is beneficial and results in more accurate object recognition.

Bisser Raytchev, Yuta Kikutsugi, Toru Tamaki, Kazufumi Kaneda

Learning Non-coplanar Scene Models by Exploring the Height Variation of Tracked Objects

In this paper, we present a novel method to overcome the common constraint of traditional camera calibration methods of surveillance systems where all objects move on a single coplanar ground plane. The proposed method estimates a scene model with non-coplanar planes by measuring the variation of pedestrian heights across the camera FOV in a statistical manner. More specifically, the proposed method automatically segments the scene image into plane regions, estimates a relative depth and estimates the altitude for each image pixel, thus building up a 3D structure with multiple non-coplanar planes. By being able to estimate the non-coplanar planes, the method can extend the applicability of 3D (single or multiple camera) tracking algorithms to a range of environments where objects (pedestrians and/or vehicles) can move on multiple non-coplanar planes (e.g. multiple levels, overpasses and stairs).

Fei Yin, Dimitrios Makris, James Orwell, Sergio A. Velastin

Optimal Regions for Linear Model-Based 3D Face Reconstruction

In this paper, we explore region-based 3D representations of the human face. We begin by noting that although they serve as a key ingredient in many state-of-the-art 3D face reconstruction algorithms, very little research has gone into devising strategies for optimally designing them. In fact, the great majority of such models encountered in the literature is based on manual segmentations of the face into subregions. We propose algorithms that are capable of automatically finding the optimal subdivision given a training set and the number of desired regions. The generality of the segmentation approach is demonstrated on examples from the TOSCA database, and a cross-validation experiment on facial data shows that part-based models designed using the proposed algorithms are capable of outperforming alternative segmentations w.r.t. reconstruction accuracy.

Michaël De Smet, Luc Van Gool

Color Kernel Regression for Robust Direct Upsampling from Raw Data of General Color Filter Array

Upsampling with preserving image details is highly demanded image operation. There are various upsampling algorithms. Many upsampling algorithms focus on the gray image. For color images, those algorithms are usually applied to a luminance component only, or independently applied channel by channel. However, we can not observe the full-color image by a single image sensor equipped in a common digital camera. The data observed by the single image sensor is called raw data. The raw data is converted into the full-color image by demosaicing. Upsampling from the raw data requires sequential processes of demosaicing and upsampling. In this paper, we propose direct upsampling from the raw data based on a kernel regression. Although the kernel regression is known as powerful denoising and interpolation algorithm, the kernel regression has been also proposed for the gray image. We extend to the color kernel regression which can generate the full-color image from any kind of raw data. Second key point of the proposed color kernel regression is a local density parameter optimization, or kernel size optimization, based on the stability of the linear system associated to the kernel regression. We also propose a novel iteration framework for the upsampling. The experimental results demonstrate that the proposed color kernel regression outperforms existing sequential approaches, reconstruction approaches, and existing kernel regression.

Masayuki Tanaka, Masatoshi Okutomi

The Large-Scale Crowd Density Estimation Based on Effective Region Feature Extraction Method

This paper proposes an intelligent video surveillance system to estimate the crowd density by effective region feature extracting (ERFE) and learning. Firstly, motion detection method is utilized to segment the foreground, and the extremal regions of the foreground are then extracted. Furthermore, a new perspective projection method is proposed to modify the 3D to 2D distortion of the extracted regions, and the moving cast shadow is eliminated based on the color invariant of the shadow region. Afterwards, histogram statistic method is applied to extract crowd features from the modified regions. Finally, the crowd features are classified into a range of density levels by using support vector machine. Experiments on real crowd videos show that the proposed density estimation system has great advantage in large-scale crowd analysis. And more importantly, better performance is achieved even on variant view angle or illumination changing conditions. Thus the video surveillance system is more robust and practical.

Hang Su, Hua Yang, Shibao Zheng

TILT: Transform Invariant Low-Rank Textures

In this paper, we show how to efficiently and effectively extract a rich class of low-rank textures in a 3D scene from 2D images despite significant distortion and warping. The low-rank textures capture geometrically meaningful structures in an image, which encompass conventional local features such as edges and corners as well as all kinds of regular, symmetric patterns ubiquitous in urban environments and man-made objects. Our approach to finding these low-rank textures leverages the recent breakthroughs in convex optimization that enable robust recovery of a high-dimensional low-rank matrix despite gross sparse errors. In the case of planar regions with significant projective deformation, our method can accurately recover both the intrinsic low-rank texture and the precise domain transformation. Extensive experimental results demonstrate that this new technique works effectively for many near-regular patterns or objects that are approximately low-rank, such as human faces and text.

Zhengdong Zhang, Xiao Liang, Arvind Ganesh, Yi Ma

Translation-Symmetry-Based Perceptual Grouping with Applications to Urban Scenes

An important finding in our understanding of the human vision system is perceptual grouping, the mechanism by which visual elements are organized into coherent groups. Though grouping is generally acknowledged to be a crucial component of the mid-level visual system, in computer vision there is a scarcity of mid-level cues due to computational difficulties in constructing feature detectors for such cues. We propose a novel mid-level visual feature detector where the visual elements are grouped based on the 2D translation subgroup of a wallpaper pattern. Different from previous state-of-the-art lattice detection algorithms for near-regular wallpaper patterns, our proposed method can detect multiple, semantically relevant 2D lattices in a scene simultaneously, achieving an effective translation-symmetry-based segmentation. Our experimental results on urban scenes demonstrate the use of translation-symmetry for building facade super-resolution and orientation estimation from a single view.

Minwoo Park, Kyle Brocklehurst, Robert T. Collins, Yanxi Liu

Towards Hypothesis Testing and Lossy Minimum Description Length: A Unified Segmentation Framework

We propose a novel algorithm for unsupervised segmentation of images based on statistical hypothesis testing. We model the distribution of the image texture features as a mixture of Gaussian distributions so that multi-normal population hypothesis test is used as a similarity measure between region features. Our algorithm iteratively merges adjacent regions that are “most similar”, until all pairs of adjacent regions are sufficiently “dissimilar”. Standing on a higher level, we give a hypothesis testing segmentation framework (HT), which allows different definitions of merging criterion and termination condition. Further more, we derive an interesting connection between HT framework and previous lossy minimum description length (LMDL) segmentation. We prove that under specific merging criterion and termination condition, LMDL can be unified as a special case under HT framework. This theoretical result also gives novel insights and improvements on LMDL based algorithms. We conduct experiments on the Berkeley Segmentation Dataset, and our algorithm achieves superior results compared to other popular methods including LMDL based algorithms.

Mingyang Jiang, Chunxiao Li, Jufu Feng, Liwei Wang

A Convex Image Segmentation: Extending Graph Cuts and Closed-Form Matting

Image matting and segmentation are two closely related topics that concern extracting the foreground and background of an image. While the methods based on global optimization are popular in both fields, the cost functions and the optimization methods have been developed independently due to the different interests of the fields: graph cuts optimize combinatorial functions yielding hard segments, and closed-form matting minimizes quadratic functions yielding soft matte.

In this paper, we note that these seemingly different costs can be represented in very similar convex forms, and suggest a generalized framework based on convex optimization, which reveals a new insight. For the optimization, a primal-dual interior point method is adopted. Under the new perspective, two novel formulations are presented showing how we can improve the state-of-the-art segmentation and matting methods. We believe that this will pave the way for more sophisticated formulations in the future.

Youngjin Park, Suk I. Yoo

Linear Solvability in the Viewing Graph

The Viewing Graph [1] represents several views linked by the corresponding fundamental matrices, estimated pairwise. Given a Viewing Graph, the tuples of consistent camera matrices form a family that we call the Solution Set.

This paper provides a theoretical framework that formalizes different properties of the topology, linear solvability and number of solutions of multi-camera systems. We systematically characterize the topology of the Viewing Graph in terms of its solution set by means of the associated algebraic bilinear system. Based on this characterization, we provide conditions about the linearity and the number of solutions and define an inductively constructible set of topologies which admit a unique linear solution. Camera matrices can thus be retrieved efficiently and large viewing graphs can be handled in a recursive fashion. The results apply to problems such as the projective reconstruction from multiple views or the calibration of camera networks.

Alessandro Rudi, Matia Pizzoli, Fiora Pirri

Inference Scene Labeling by Incorporating Object Detection with Explicit Shape Model

In this paper, we incorporate shape detection into contextual scene labeling and make use of both shape, texture, and context information in a graphical representation. We propose a candidacy graph, whose vertices are two types of recognition candidates for either a superpixel or a window patch. The superpixel candidates are generated by a discriminative classifier with textural features as well as the window proposals by a learned deformable templates model in the bottom-up steps. The contextual and competitive interactions between graph vertices, in form of probabilistic connecting edges, are defined by two types of contextual metrics and the overlapping of their image domain, respectively. With this representation, a composite clustering sampling algorithm is proposed to fast search the optimal convergence globally using the Markov Chain Monte Carlo (MCMC). Our approach is applied on both lotus hill institute (LHI) and MSRC public datasets and achieves the state-of-art results.

Quan Zhou, Wenyu Liu

Saliency Density Maximization for Object Detection and Localization

Accurate localization of the salient object from an image is a difficult problem when the saliency map is noisy and incomplete. A fast approach to detect salient objects from images is proposed in this paper. To well balance the size of the object and the saliency it contains, the salient object detection is first formulated with the maximum saliency density on the saliency map. To obtain the global optimal solution, a branch-and-bound search algorithm is developed to speed up the detection process. Without any prior knowledge provided, the proposed method can effectively and efficiently detect salient objects from images. Extensive results on different types of saliency maps with a public dataset of five thousand images show the advantages of our approach as compared to some state-of-the-art methods.

Ye Luo, Junsong Yuan, Ping Xue, Qi Tian

Modified Hybrid Bronchoscope Tracking Based on Sequential Monte Carlo Sampler: Dynamic Phantom Validation

This paper presents a new hybrid bronchoscope tracking method that uses an electromagnetic position sensor, a sequential Monte Carlo sampler, and its evaluation on a dynamic motion phantom. Since airway deformation resulting from patient movement, respiratory motion, and coughing can significantly affect the rigid registration between electromagnetic tracking and computed tomography (CT) coordinate systems, a standard hybrid tracking approach that initializes intensity-based image registration with absolute pose data acquired by electromagnetic tracking fails when the initial camera pose is too far from the actual pose. We propose a new solution that combines electromagnetic tracking and a sequential Monte Carlo sampler to address this problem. In our solution, sequential Monte Carlo sampling is introduced to recursively approximate the posterior probability distributions of the bronchoscope camera motion parameters in accordance with the observation model based on electromagnetic tracking. We constructed a dynamic phantom that simulates airway deformation to evaluate our proposed solution. Experimental results demonstrate that the challenging problem of airway deformation can be robustly modeled and effectively addressed with our proposed approach compared to a previous hybrid method, even when the maximum simulated airway deformation reaches 23 mm.

Xióngbiāo Luó, Tobias Reichl, Marco Feuerstein, Takayuki Kitasaka, Kensaku Mori

Affine Warp Propagation for Fast Simultaneous Modelling and Tracking of Articulated Objects

We propose a new framework that allows simultaneous modelling and tracking of articulated objects in real time. We introduce a non-probabilistic graphical model and a new type of message that propagates explicit motion information for realignment of feature constellations across frames. These messages are weighted according to the rigidity of the relations between the source and destination features. We also present a method for learning these weights as well as the spatial relations between connected feature points, automatically identifying deformable and rigid object parts. Our method is extremely fast and allows simultaneous learning and tracking of nonrigid models containing hundreds of feature points with negligible computational overhead.

Arnaud Declercq, Justus Piater

kPose: A New Representation For Action Recognition

Human action recognition is an important problem in computer vision. Most existing techniques use all the video frames for action representation, which leads to high computational cost. Different from these techniques, we present a novel action recognition approach by describing the action with a few frames of representative poses, namely


. Firstly, a set of pose templates corresponding to different pose classes are learned based on a newly proposed Pose-Weighted Distribution Model (PWDM). Then, a local set of


s describing an action are extracted by clustering the poses belonging to the action. Thirdly, a further


selection is carried out to remove the redundant poses among the different local sets, which leads to a global set of


s with the least redundancy. Finally, a sequence of


s is obtained to describe the action by searching the nearest


in the global set. And the proposed action classification is carried out by comparing the obtained pose sequence with each local set of


. The experimental results validate the proposed method by remarkable recognition accuracy.

Zhuoli Zhou, Mingli Song, Luming Zhang, Dacheng Tao, Jiajun Bu, Chun Chen

Identifying Surprising Events in Videos Using Bayesian Topic Models

Automatic processing of video data is essential in order to allow efficient access to large amounts of video content, a crucial point in such applications as video mining and surveillance. In this paper we focus on the problem of identifying interesting parts of the video. Specifically, we seek to identify atypical video events, which are the events a human user is usually looking for. To this end we employ the notion of Bayesian surprise, as defined in [1,2], in which an event is considered surprising if its occurrence leads to a large change in the probability of the world model. We propose to compute this abstract measure of surprise by first modeling a corpus of video events using the Latent Dirichlet Allocation model. Subsequently, we measure the change in the Dirichlet prior of the LDA model as a result of each video event’s occurrence. This change of the Dirichlet prior leads to a closed form expression for an event’s level of surprise, which can then be inferred directly from the observed data. We tested our algorithm on a real dataset of video data, taken by a camera observing an urban street intersection. The results demonstrate our ability to detect atypical events, such as a car making a U-turn or a person crossing an intersection diagonally.

Avishai Hendel, Daphna Weinshall, Shmuel Peleg

Face Detection with Effective Feature Extraction

There is an abundant literature on face detection due to its important role in many vision applications. Since Viola and Jones proposed the first real-time AdaBoost based face detector, Haar-like features have been adopted as the method of choice for frontal face detection. In this work, we show that simple features other than Haar-like features can also be applied for training an effective face detector. Since, single feature is not discriminative enough to separate faces from difficult non-faces, we further improve the generalization performance of our simple features by introducing feature co-occurrences. We demonstrate that our proposed features yield a performance improvement compared to Haar-like features. In addition, our findings indicate that features play a crucial role in the ability of the system to generalize.

Sakrapee Paisitkriangkrai, Chunhua Shen, Jian Zhang

Multiple Order Graph Matching

This paper addresses the problem of finding correspondences between two sets of features by using multiple order constraints all together. First, we build a high-order supersymmetric tensor, called multiple order tensor, to incorporate the constraints of different orders (e.g., unary, pairwise, third order, etc.). The multiple order tensor naturally merges multi-granularity geometric affinities, thus it presents stronger descriptive power of consistent constraints than the individual order based methods. Second, to achieve the optimal matching, we present an efficient computational approach for the power iteration of the multiple order tensor. It only needs sparse tensor elements and reduces the sampling size of feature tuples, due to the supersymmetry of the multiple order tensor. The experiments on both synthetic and real image data show that our approach improves the matching performance compared to state-of-the-art algorithms.

Aiping Wang, Sikun Li, Liang Zeng

Abstraction and Generalization of 3D Structure for Recognition in Large Intra-Class Variation

Humans have abstract models for object classes which helps recognize previously unseen instances, despite large intra-class variations. Also objects are grouped into classes based on their purpose. Studies in cognitive science show that humans maintain abstractions and certain specific features from the instances they observe. In this paper, we address the challenging task of creating a system which can learn such canonical models in a uniform manner for different classes. Using just a few examples the system creates a canonical model (COMPAS) per class, which is used to recognize classes with large intra-class variation (chairs, benches, sofas all belong to sitting class). We propose a robust representation and automatic scheme for abstraction and generalization. We quantitatively demonstrate improved recognition and classification accuracy over state-of-art 3D shape matching/classification method and discuss advantages over rule based systems.

Gowri Somanath, Chandra Kambhamettu

Exploiting Self-similarities for Single Frame Super-Resolution

We propose a super-resolution method that exploits self-similarities and group structural information of image patches using only one single input frame. The super-resolution problem is posed as learning the mapping between pairs of low-resolution and high-resolution image patches. Instead of relying on an extrinsic set of training images as often required in example-based super-resolution algorithms, we employ a method that generates image pairs directly from the image pyramid of one single frame. The generated patch pairs are clustered for training a dictionary by enforcing group sparsity constraints underlying the image patches. Super-resolution images are then constructed using the learned dictionary. Experimental results show the proposed method is able to achieve the state-of-the-art performance.

Chih-Yuan Yang, Jia-Bin Huang, Ming-Hsuan Yang

On Feature Combination and Multiple Kernel Learning for Object Tracking

This paper presents a new method for object tracking based on multiple kernel learning (MKL). MKL is used to learn an optimal combination of

$\mathop \chi \nolimits^2$

kernels and Gaussian kernels, each type of which captures a different feature. Our features include the color information and spatial pyramid histogram (SPH) based on global spatial correspondence of the geometric distribution of visual words. We propose a simple effective way for on-line updating MKL classifier, where useful tracking objects are automatically selected as support vectors. The algorithm handle target appearance variation, and makes better usage of history information, which leads to better discrimination of target and the surrounding background. The experiments on real world sequences demonstrate that our method can track objects accurately and robustly especially under partial occlusion and large appearance change.

Huchuan Lu, Wenling Zhang, Yen-Wei Chen

Correspondence-Free Multi Camera Calibration by Observing a Simple Reference Plane

In the present paper, we propose a multi camera calibration method that estimates both the intrinsic and extrinsic parameters of each camera. Assuming a reference plane has an infinitely repeated pattern, finding corresponding points between cameras is regarded as being equivalent to the estimation of discrete 2-D transformation on the observed reference plane. This means that the proposed method does not require any overlap of the observed region, and bundle adjustment can be performed in the sense of point-to-point correspondence. Our experiment demonstrates that the proposed method is practically admissible and sufficiently useful for building a simple shape measurement system using multiple cameras.

Satoshi Kawabata, Yoshihiro Kawai

Over-Segmentation Based Background Modeling and Foreground Detection with Shadow Removal by Using Hierarchical MRFs

In this paper, we propose a novel over-segmentation based method for the detection of foreground objects from a surveillance video by integrating techniques of background modeling and Markov Random Fields classification. Firstly, we introduce a fast affinity propagation clustering algorithm to produce the over-segmentation of a reference image by taking into account color difference and spatial relationship between pixels. A background model is learned by using Gaussian Mixture Models with color features of the segments to represent the time-varying background scene. Next, each segment is treated as a node in a Markov Random Field and assigned a state of foreground, shadow and background, which is determined by using hierarchical belief propagation. The relationship between neighboring regions is also considered to ensure spatial coherence of segments. Finally, we demonstrate experimental results on several image sequences to show the effectiveness and robustness of the proposed method.

Te-Feng Su, Yi-Ling Chen, Shang-Hong Lai

MRF-Based Background Initialisation for Improved Foreground Detection in Cluttered Surveillance Videos

Robust foreground object segmentation via background modelling is a difficult problem in cluttered environments, where obtaining a clear view of the background to model is almost impossible. In this paper, we propose a method capable of robustly estimating the background and detecting regions of interest in such environments. In particular, we propose to extend the background initialisation component of a recent patch-based foreground detection algorithm with an elaborate technique based on Markov Random Fields, where the optimal labelling solution is computed using iterated conditional modes. Rather than relying purely on local temporal statistics, the proposed technique takes into account the spatial continuity of the entire background. Experiments with several tracking algorithms on the CAVIAR dataset indicate that the proposed method leads to considerable improvements in object tracking accuracy, when compared to methods based on Gaussian mixture models and feature histograms.

Vikas Reddy, Conrad Sanderson, Andres Sanin, Brian C. Lovell

Adaptive ε LBP for Background Subtraction

Background subtraction plays an important role in many computer vision systems, yet in complex scenes it is still a challenging task, especially in case of illumination variations. In this work, we develop an efficient texture-based method to tackle this problem. First, we propose a novel adaptive



operator, in which the threshold is adaptively calculated by compromising two criterions, i.e. the description stability and the discriminative ability. Then, the naive Bayesian technique is adopted to effectively model the probability distribution of local patterns in the pixel level, which utilizes only one single



pattern instead of



histogram of local region. Our approach is evaluated on several video sequences against the traditional methods. Experiments show that our method is suitable for various scenes, especially can robust handle illumination variations.

LingFeng Wang, HuaiYu Wu, ChunHong Pan

Continuous Surface-Point Distributions for 3D Object Pose Estimation and Recognition

We present a 3D, probabilistic object-surface model, along with mechanisms for probabilistically integrating unregistered 2.5D views into the model, and for segmenting model instances in cluttered scenes. The object representation is a probabilistic expression of object parts through smooth surface-point distributions obtained by kernel density estimation on 3D point clouds. A multi-part, viewpoint-invariant model is learned incrementally from a set of roughly segmented, unregistered views, by sequentially registering and fusing the views with the incremental model. Registration is conducted by nonparametric inference of maximum-likelihood model parameters, using Metropolis–Hastings MCMC with simulated annealing. The learning of viewpoint-invariant models and the applicability of our method to pose estimation, object detection, and object recognition is demonstrated on 3D-scan data, providing qualitative, quantitative and comparative evaluations.

Renaud Detry, Justus Piater

Efficient Structured Support Vector Regression

Support Vector Regression (SVR) has been a long standing problem in machine learning, and gains its popularity on various computer vision tasks. In this paper, we propose a structured support vector regression framework by extending the max-margin principle to incorporate spatial correlations among neighboring pixels. The objective function in our framework considers both label information and pairwise features, helping to achieve better cross-smoothing over neighboring nodes. With the bundle method, we effectively reduce the number of constraints and alleviate the adverse effect of outliers, leading to an efficient and robust learning algorithm. Moreover, we conduct a thorough analysis for the loss function used in structured regression, and provide a principled approach for defining proper loss functions and deriving the corresponding solvers to find the most violated constraint. We demonstrate that our method outperforms the state-of-the-art regression approaches on various testbeds of synthetic images and real-world scenes.

Ke Jia, Lei Wang, Nianjun Liu

Cage-Based Tracking for Performance Animation

Full body performance capture is a promising emerging technology that has been intensively studied in Computer Graphics and Computer Vision over the last decade. Highly-detailed performance animations are easier to obtain using existing multiple views platforms, markerless capture and 3D laser scanner. In this paper, we investigate the feasibility of extracting optimal reduced animation parameters without requiring an underlying rigid kinematic structure. This paper explores the potential of introducing harmonic cage-based linear estimation and deformation as post-process of current performance capture techniques used in 3D time-varying scene capture technology. We propose the first algorithm for performing cage-based tracking across time for vision and virtual reality applications. The main advantages of our novel approach are its linear single pass estimation of the desired surface, easy-to-reuse output cage sequences and reduction in storage size of animations. Our results show that estimated parameters allow a sufficient silhouette-consistent generation of the enclosed mesh under sparse frame-to-frame animation constraints and large deformation.

Yann Savoye, Jean-Sébastien Franco

Modeling Dynamic Scenes Recorded with Freely Moving Cameras

Dynamic scene modeling is a challenging problem in computer vision. Many techniques have been developed in the past to address such a problem but most of them focus on achieving accurate reconstructions in controlled environments, where the background and the lighting are known and the cameras are fixed and calibrated. Recent approaches have relaxed these requirements by applying these techniques to outdoor scenarios. The problem however becomes even harder when the cameras are allowed to move during the recording since no background color model can be easily inferred.

In this paper we propose a new approach to model dynamic scenes captured in outdoor environments with moving cameras. A probabilistic framework is proposed to deal with such a scenario and to provide a volumetric reconstruction of all the dynamic elements of the scene.

The proposed algorithm was tested on a publicly available dataset filmed outdoors with six moving cameras. A quantitative evaluation of the method was also performed on synthetic data. The obtained results demonstrated the effectiveness of the approach considering the complexity of the problem.

Aparna Taneja, Luca Ballan, Marc Pollefeys

Learning Image Structures for Optimizing Disparity Estimation

We present a method for optimizing the stereo matching process when it is applied to a series of images with similar depth structures. We observe that there are similar regions with homogeneous colors in many images and propose to use image characteristics to recognize them. We use patterns in the data dependent triangulations of images to learn characteristics of the scene. As our learning method is based on triangulations rather than segments, the method can be used for diverse types of scenes. A hypotheses of interpolation is generated for each type of structure and tested against the ground truth to retain only those which are valid. The information learned is used in finding the solution to the Markov random field associated with a new scene. We modify the graph cuts algorithm to include steps which impose learned disparity patterns on current scene. We show that our method reduces errors in the disparities and also decreases the number of pixels which have to be subjected to a complete cycle of graph cuts. We train and evaluate our algorithm on the Middlebury stereo dataset and quantitatively show that it produces better disparity than unmodified graph cuts.

M. V. Rohith, Chandra Kambhamettu

Image Reconstruction for High-Sensitivity Imaging by Using Combined Long/Short Exposure Type Single-Chip Image Sensor

We propose a image reconstruction method and a sensor for high-sensitivity imaging using long-term exposed green pixels over several frames. As a result of extending the exposure time of green pixels, motion blur increases. We use motion information detected from high-frame-rate red and blue pixels to remove the motion blur. To implement this method, both long- and short-term exposed pixels are arranged in a checkerboard pattern on a single-chip image sensor. Using the proposed method, we improved fourfold the sensitivity of the green pixels without any motion blur.

Sanzo Ugawa, Takeo Azuma, Taro Imagawa, Yusuke Okada

On the Use of Implicit Shape Models for Recognition of Object Categories in 3D Data

The ability of recognizing object categories in 3D data is still an underdeveloped topic. This paper investigates on adopting Implicit Shape Models (ISMs) for 3D categorization, that, differently from current approaches, include also information on the geometrical structure of each object category. ISMs have been originally proposed for recognition and localization of categories in cluttered images. Modifications to allow for a correct deployment for 3D data are discussed. Moreover, we propose modifications to three design points within the structure of a standard ISM to enhance its effectiveness for the categorization of databases entries, either 3D or 2D: namely, codebook size and composition, codeword activation strategy and vote weight strategy. Experimental results on two standard 3D datasets allow us to discuss the positive impact of the proposed modifications as well as to show the performance in recognition accuracy yielded by our approach compared to the state of the art.

Samuele Salti, Federico Tombari, Luigi Di Stefano

Phase Registration of a Single Quasi-Periodic Signal Using Self Dynamic Time Warping

This paper proposes a method for phase registration of a single non-parametric quasi-periodic signal. After a short-term period has been detected for each sample by normalized autocorrelation, Self Dynamic Time Warping (Self DTW) between a quasi-periodic signal and that with multiple-period shifts is applied to obtain corresponding samples of the same phase. A phase sequence is finally estimated by the optimization framework including the data term derived from the correspondences, the regularization term derived from short-term periods, and a monotonic increasing constraint of the phase. Experiments on quasi-periodic signals from both simulated and real data show the effectiveness of the proposed method.

Yasushi Makihara, Ngo Thanh Trung, Hajime Nagahara, Ryusuke Sagawa, Yasuhiro Mukaigawa, Yasushi Yagi

Latent Gaussian Mixture Regression for Human Pose Estimation

Discriminative approaches for human pose estimation model the functional mapping, or conditional distribution, between image features and 3D pose. Learning such multi-modal models in high dimensional spaces, however, is challenging with limited training data; often resulting in over-fitting and poor generalization. To address these issues latent variable models (LVMs) have been introduced. Shared LVMs attempt to learn a coherent, typically non-linear, latent space shared by image features and 3D poses, distribution of data in that latent space, and conditional distributions to and from this latent space to carry out inference. Discovering the shared manifold structure can, in itself, however, be challenging. In addition, shared LVMs models are most often non-parametric, requiring the model representation to be a function of the training set size. We present a parametric framework that addresses these shortcoming. In particular, we learn latent spaces, and distributions within them, for image features and 3D poses separately first, and then learn a multi-modal conditional density between these two low-dimensional spaces in the form of Gaussian Mixture Regression. Using our model we can address the issue of over-fitting and generalization, since the data is denser in the learned latent space, as well as avoid the necessity of learning a shared manifold for the data. We quantitatively evaluate and compare the performance of the proposed method to several state-of-the-art alternatives, and show that our method gives a competitive performance.

Yan Tian, Leonid Sigal, Hernán Badino, Fernando De la Torre, Yong Liu

Top-Down Cues for Event Recognition

How to fuse static and dynamic information is a key issue in event analysis. In this paper, we present a novel approach to combine appearance and motion information together through a top-down manner for event recognition in real videos. Unlike the conventional bottom-up way, attention can be focused volitionally on top-down signals derived from task demands. A video is represented by a collection of spatio-temporal features, called video words by quantizing the extracted spatio-temporal interest points (STIPs) from the video. We propose two approaches to build class specific visual or motion histograms for the corresponding features. One is using the probability of a class given a visual or motion word. High probability means more attention should be paid to this word. Moreover, in order to incorporate the negative information for each word, we propose to utilize the mutual information between each word and event label. High mutual information means high relevance between this word and the class label. Both methods not only can characterize two aspects of an event, but also can select the relevant words, which are all discriminative to the corresponding event. Experimental results on the TRECVID 2005 and the HOHA video corpus demonstrate that the mean average precision has been improved by using the proposed method.

Li Li, Chunfeng Yuan, Weiming Hu, Bing Li

Robust Photometric Stereo via Low-Rank Matrix Completion and Recovery

We present a new approach to robustly solve photometric stereo problems. We cast the problem of recovering surface normals from multiple lighting conditions as a problem of recovering a low-rank matrix with both missing entries and corrupted entries, which model all types of non-Lambertian effects such as shadows and specularities. Unlike previous approaches that use Least-Squares or heuristic robust techniques, our method uses advanced convex optimization techniques that are guaranteed to find the correct low-rank matrix by simultaneously fixing its missing and erroneous entries. Extensive experimental results demonstrate that our method achieves unprecedentedly accurate estimates of surface normals in the presence of significant amount of shadows and specularities. The new technique can be used to improve virtually any photometric stereo method including uncalibrated photometric stereo.

Lun Wu, Arvind Ganesh, Boxin Shi, Yasuyuki Matsushita, Yongtian Wang, Yi Ma

Robust Auxiliary Particle Filter with an Adaptive Appearance Model for Visual Tracking

The algorithm proposed in this paper is designed to solve two challenging issues in visual tracking: uncertainty in a dynamic motion model and severe object appearance change. To avoid filter drift due to inaccuracies in a dynamic motion model, a sliding window approach is applied to particle filtering by considering a recent set of observations with which internal auxiliary estimates are sequentially calculated, so that the level of uncertainty in the motion model is significantly reduced. With a new auxiliary particle filter, abrupt movements can be effectively handled with a light computational load. Another challenge, severe object appearance change, is adaptively overcome via a modified principal component analysis. By utilizing a recent set of observations, the spatiotemporal piecewise linear subspace of an appearance manifold is incrementally approximated. In addition, distraction in the filtering results is alleviated by using a layered sampling strategy to efficiently determine the best fit particle in the high-dimensional state space. Compared to existing algorithms, the proposed algorithm produces successful results, especially when difficulties are combined.

Du Yong Kim, Ehwa Yang, Moongu Jeon, Vladimir Shin

Sustained Observability for Salient Motion Detection

Detection of the motion of foreground objects on the backdrop of constantly changing and complex visuals has always been challenging. The motion of foreground objects, which is termed as salient motion, is marked by its predictability compared to the more complex unpredictable motion of the backgrounds like fluttering of leaves, ripples in water, smoke filled environments etc. We introduce a novel approach to detect this salient motion based on the control theory concept of ’observability’ from the outputs, when the video sequence is represented as a linear dynamical system. The resulting algorithm is tested on a set of challenging sequences and compared to the state-of-the-art methods to showcase its superior performance on grounds of its computational efficiency and detection capability of the salient motion.

Viswanath Gopalakrishnan, Yiqun Hu, Deepu Rajan

Markerless and Efficient 26-DOF Hand Pose Recovery

We present a novel method that, given a sequence of synchronized views of a human hand, recovers its 3D position, orientation and full articulation parameters. The adopted hand model is based on properly selected and assembled 3D geometric primitives. Hypothesized configurations/poses of the hand model are projected to different camera views and image features such as edge maps and hand silhouettes are computed. An objective function is then used to quantify the discrepancy between the predicted and the actual, observed features. The recovery of the 3D hand pose amounts to estimating the parameters that minimize this objective function which is performed using Particle Swarm Optimization. All the basic components of the method (feature extraction, objective function evaluation, optimization process) are inherently parallel. Thus, a GPU-based implementation achieves a speedup of two orders of magnitude over the case of CPU processing. Extensive experimental results demonstrate qualitatively and quantitatively that accurate 3D pose recovery of a hand can be achieved robustly at a rate that greatly outperforms the current state of the art.

Iasonas Oikonomidis, Nikolaos Kyriazis, Antonis A. Argyros

Stick It! Articulated Tracking Using Spatial Rigid Object Priors

Articulated tracking of humans is a well-studied field, but most work has treated the humans as being independent of the environment. Recently, Kjellström et al. [1] showed how knowledge of interaction with a known rigid object provides constraints that lower the degrees of freedom in the model. While the phrased problem is interesting, the resulting algorithm is computationally too demanding to be of practical use. We present a simple and elegant model for describing this problem. The resulting algorithm is computationally much more efficient, while it at the same time produces superior results.

Søren Hauberg, Kim Steenstrup Pedersen

A Method for Text Localization and Recognition in Real-World Images

A general method for text localization and recognition in real-world images is presented. The proposed method is novel, as it (i) departs from a strict feed-forward pipeline and replaces it by a hypotheses-verification framework simultaneously processing multiple text line hypotheses, (ii) uses synthetic fonts to train the algorithm eliminating the need for time-consuming acquisition and labeling of real-world training data and (iii) exploits Maximally Stable Extremal Regions (MSERs) which provides robustness to geometric and illumination conditions.

The performance of the method is evaluated on two standard datasets. On the Char74k dataset, a recognition rate of 72% is achieved, 18% higher than the state-of-the-art. The paper is first to report both text detection and


results on the standard and rather challenging ICDAR 2003 dataset. The text localization works for number of alphabets and the method is easily adapted to recognition of other scripts, e.g. cyrillics.

Lukas Neumann, Jiri Matas


Weitere Informationen

Premium Partner

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.



Best Practices für die Mitarbeiter-Partizipation in der Produktentwicklung

Unternehmen haben das Innovationspotenzial der eigenen Mitarbeiter auch außerhalb der F&E-Abteilung erkannt. Viele Initiativen zur Partizipation scheitern in der Praxis jedoch häufig. Lesen Sie hier  - basierend auf einer qualitativ-explorativen Expertenstudie - mehr über die wesentlichen Problemfelder der mitarbeiterzentrierten Produktentwicklung und profitieren Sie von konkreten Handlungsempfehlungen aus der Praxis.
Jetzt gratis downloaden!