nach oben

2010 | Buch

Kapitel lesen Erstes Kapitel lesen

Computer Vision – ECCV 2010

11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part VI

herausgegeben von: Kostas Daniilidis, Petros Maragos, Nikos Paragios

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The 2010 edition of the European Conference on Computer Vision was held in Heraklion, Crete. The call for papers attracted an absolute record of 1,174 submissions. We describe here the selection of the accepted papers: Thirty-eight area chairs were selected coming from Europe (18), USA and Canada (16), and Asia (4). Their selection was based on the following criteria: (1) Researchers who had served at least two times as Area Chairs within the past two years at major vision conferences were excluded; (2) Researchers who served as Area Chairs at the 2010 Computer Vision and Pattern Recognition were also excluded (exception: ECCV 2012 Program Chairs); (3) Minimization of overlap introduced by Area Chairs being former student and advisors; (4) 20% of the Area Chairs had never served before in a major conference; (5) The Area Chair selection process made all possible efforts to achieve a reasonable geographic distribution between countries, thematic areas and trends in computer vision. Each Area Chair was assigned by the Program Chairs between 28–32 papers. Based on paper content, the Area Chair recommended up to seven potential reviewers per paper. Such assignment was made using all reviewers in the database including the conflicting ones. The Program Chairs manually entered the missing conflict domains of approximately 300 reviewers. Based on the recommendation of the Area Chairs, three reviewers were selected per paper (with at least one being of the top three suggestions), with 99.

Inhaltsverzeichnis

Frontmatter

Visual Learning

Constrained Spectral Clustering via Exhaustive and Efficient Constraint Propagation

This paper presents an exhaustive and efficient constraint propagation approach to exploiting pairwise constraints for spectral clustering. Since traditional label propagation techniques cannot be readily generalized to propagate pairwise constraints, we tackle the constraint propagation problem inversely by decomposing it to a set of independent label propagation subproblems which are further solved in quadratic time using semi-supervised learning based on

-nearest neighbors graphs. Since this time complexity is proportional to the number of all possible pairwise constraints, our approach gives a computationally efficient solution for exhaustively propagating pairwise constraint throughout the entire dataset. The resulting exhaustive set of propagated pairwise constraints are then used to adjust the weight (or similarity) matrix for spectral clustering. It is worth noting that this paper first clearly shows how pairwise constraints are propagated independently and then accumulated into a conciliatory closed-form solution. Experimental results on real-life datasets demonstrate that our approach to constrained spectral clustering outperforms the state-of-the-art techniques.

Zhiwu Lu, Horace H. S. Ip

Object Recognition with Hierarchical Stel Models

We propose a new generative model, and a new image similarity kernel based on a linked hierarchy of probabilistic segmentations. The model is used to efficiently segment multiple images into a consistent set of image regions. The segmentations are provided at several levels of granularity and links among them are automatically provided. Model training and inference in it is faster than most local feature extraction algorithms, and yet the provided image segmentation, and the segment matching among images provide a rich backdrop for image recognition, segmentation and registration tasks.

Alessandro Perina, Nebojsa Jojic, Umberto Castellani, Marco Cristani, Vittorio Murino

MIForests: Multiple-Instance Learning with Randomized Trees

Multiple-instance learning (MIL) allows for training classifiers from ambiguously labeled data. In computer vision, this learning paradigm has been recently used in many applications such as object classification, detection and tracking. This paper presents a novel multiple-instance learning algorithm for randomized trees called

MIForests

. Randomized trees are fast, inherently parallel and multi-class and are thus increasingly popular in computer vision. MIForest combine the advantages of these classifiers with the flexibility of multiple instance learning. In order to leverage the randomized trees for MIL, we define the hidden class labels inside target bags as random variables. These random variables are optimized by training random forests and using a fast iterative homotopy method for solving the non-convex optimization problem. Additionally, most previously proposed MIL approaches operate in batch or off-line mode and thus assume access to the entire training set. This limits their applicability in scenarios where the data arrives sequentially and in dynamic environments. We show that MIForests are not limited to off-line problems and present an on-line extension of our approach. In the experiments, we evaluate MIForests on standard visual MIL benchmark datasets where we achieve state-of-the-art results while being faster than previous approaches and being able to inherently solve multi-class problems. The on-line version of MIForests is evaluated on visual object tracking where we outperform the state-of-the-art method based on boosting.

Christian Leistner, Amir Saffari, Horst Bischof

Manifold Valued Statistics, Exact Principal Geodesic Analysis and the Effect of Linear Approximations

Manifolds are widely used to model non-linearity arising in a range of computer vision applications. This paper treats statistics on manifolds and the loss of accuracy occurring when linearizing the manifold prior to performing statistical operations. Using recent advances in manifold computations, we present a comparison between the non-linear analog of Principal Component Analysis, Principal Geodesic Analysis, in its linearized form and its exact counterpart that uses true intrinsic distances. We give examples of datasets for which the linearized version provides good approximations and for which it does not. Indicators for the differences between the two versions are then developed and applied to two examples of manifold valued data: outlines of vertebrae from a study of vertebral fractures and spacial coordinates of human skeleton end-effectors acquired using a stereo camera and tracking software.

Stefan Sommer, François Lauze, Søren Hauberg, Mads Nielsen

Stacked Hierarchical Labeling

In this work we propose a hierarchical approach for labeling semantic objects and regions in scenes. Our approach is reminiscent of early vision literature in that we use a decomposition of the image in order to encode relational and spatial information. In contrast to much existing work on structured prediction for scene understanding, we bypass a global probabilistic model and instead directly train a hierarchical inference

procedure

inspired by the message passing mechanics of some approximate inference procedures in graphical models. This approach mitigates both the theoretical and empirical difficulties of learning probabilistic models when exact inference is intractable. In particular, we draw from recent work in machine learning and break the complex inference process into a hierarchical series of simple machine learning subproblems. Each subproblem in the hierarchy is designed to capture the image and contextual statistics in the scene. This hierarchy spans coarse-to-fine regions and explicitly models the mixtures of semantic labels that may be present due to imperfect segmentation. To avoid cascading of errors and overfitting, we train the learning problems in sequence to ensure robustness to likely errors earlier in the inference sequence and leverage the stacking approach developed by Cohen

et al

Daniel Munoz, J. Andrew Bagnell, Martial Hebert

Spotlights and Posters R2

Fully Isotropic Fast Marching Methods on Cartesian Grids

The existing Fast Marching methods which are used to solve the Eikonal equation use a locally continuous model to estimate the accumulated cost, but a discontinuous (discretized) model for the traveling cost around each grid point. Because the accumulated cost and the traveling (local) cost are treated differently, the estimate of the accumulated cost at any point will vary based on the direction of the arriving front. Instead we propose to estimate the traveling cost at each grid point based on a locally continuous model, where we will interpolate the traveling cost along the direction of the propagating front. We further choose an interpolation scheme that is not biased by the direction of the front. Thus making the fast marching process truly isotropic. We show the significance of removing the directional bias in the computation of the cost in certain applications of fast marching method. We also compare the accuracy and computation times of our proposed methods with the existing state of the art fast marching techniques to demonstrate the superiority of our method.

Vikram Appia, Anthony Yezzi

Clustering Complex Data with Group-Dependent Feature Selection

We describe a clustering approach with the emphasis on detecting coherent structures in a complex dataset, and illustrate its effectiveness with computer vision applications. By complex data, we mean that the attribute variations among the data are too extensive such that clustering based on a single feature representation/descriptor is insufficient to faithfully divide the data into meaningful groups. The proposed method thus assumes the data are represented with various feature representations, and aims to uncover the underlying cluster structure. To that end, we associate each cluster with a boosting classifier derived from

multiple kernel learning

, and apply the cluster-specific classifier to feature selection across various descriptors to best separate data of the cluster from the rest. Specifically, we integrate the multiple, correlative training tasks of the cluster-specific classifiers into the clustering procedure, and cast them as a joint constrained optimization problem. Through the optimization iterations, the cluster structure is gradually revealed by these classifiers, while their discriminant power to capture similar data would be progressively improved owing to better data labeling.

Yen-Yu Lin, Tyng-Luh Liu, Chiou-Shann Fuh

On Parameter Learning in CRF-Based Approaches to Object Class Image Segmentation

Recent progress in per-pixel object class labeling of natural images can be attributed to the use of multiple types of image features and sound statistical learning approaches. Within the latter, Conditional Random Fields (CRF) are prominently used for their ability to represent interactions between random variables. Despite their popularity in computer vision,

parameter learning

for CRFs has remained difficult, popular approaches being cross-validation and

piecewise training

In this work, we propose a simple yet expressive tree-structured CRF based on a recent hierarchical image segmentation method. Our model combines and weights multiple image features within a hierarchical representation and allows simple and efficient globally-optimal learning of ≈ 10

parameters. The tractability of our model allows us to pose and answer some of the open questions regarding parameter learning applying to CRF-based approaches. The key findings for learning CRF models are, from the obvious to the surprising, i) multiple image features always help, ii) the limiting dimension with respect to current models is the amount of training data, iii) piecewise training is competitive, iv) current methods for max-margin training fail for models with many parameters.

Sebastian Nowozin, Peter V. Gehler, Christoph H. Lampert

Exploring the Identity Manifold: Constrained Operations in Face Space

In this paper, we constrain faces to points on a manifold within the parameter space of a linear statistical model. The manifold is the subspace of faces which have maximally likely distinctiveness and different points correspond to unique identities. We show how the tools of differential geometry can be used to replace linear operations such as warping and averaging with operations on the surface of this manifold. We use the manifold to develop a new method for fitting a statistical face shape model to data, which is both robust (avoids overfitting) and overcomes model dominance (is not susceptible to local minima close to the mean face). Our method outperforms a generic non-linear optimiser when fitting a dense 3D morphable face model to data.

Ankur Patel, William A. P. Smith

Multi-label Linear Discriminant Analysis

Multi-label problems arise frequently in image and video annotations, and many other related applications such as multi-topic text categorization, music classification, etc. Like other computer vision tasks, multi-label image and video annotations also suffer from the difficulty of high dimensionality because images often have a large number of features. Linear discriminant analysis (LDA) is a well-known method for dimensionality reduction. However, the classical Linear Discriminant Analysis (LDA) only works for single-label multi-class classifications and cannot be directly applied to multi-label multi-class classifications. It is desirable to naturally generalize the classical LDA to multi-label formulations. At the same time, multi-label data present a new opportunity to improve classification accuracy through label correlations, which are absent in single-label data. In this work, we propose a novel Multi-label Linear Discriminant Analysis (MLDA) method to take advantage of label correlations and explore the powerful classification capability of the classical LDA to deal with multi-label multi-class problems. Extensive experimental evaluations on five public multi-label data sets demonstrate excellent performance of our method.

Hua Wang, Chris Ding, Heng Huang

Convolutional Learning of Spatio-temporal Features

We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of successive images. The convolutional architecture of our model allows it to scale to realistic image sizes whilst using a compact parametrization. In experiments on the NORB dataset, we show our model extracts latent “flow fields” which correspond to the transformation between the pair of input frames. We also use our model to extract low-level motion features in a multi-stage architecture for action recognition, demonstrating competitive performance on both the KTH and Hollywood2 datasets.

Graham W. Taylor, Rob Fergus, Yann LeCun, Christoph Bregler

Learning Pre-attentive Driving Behaviour from Holistic Visual Features

The aim of this paper is to learn driving behaviour by associating the actions recorded from a human driver with pre-attentive visual input, implemented using holistic image features (GIST). All images are labelled according to a number of driving–relevant contextual classes (eg, road type, junction) and the driver’s actions (eg, braking, accelerating, steering) are recorded. The association between visual context and the driving data is learnt by Boosting decision stumps, that serve as input dimension selectors. Moreover, we propose a novel formulation of GIST features that lead to an improved performance for action prediction. The areas of the visual scenes that contribute to activation or inhibition of the predictors is shown by drawing activation maps for all learnt actions. We show good performance not only for detecting driving–relevant contextual labels, but also for predicting the driver’s actions. The classifier’s false positives and the associated activation maps can be used to focus attention and further learning on the uncommon and difficult situations.

Nicolas Pugeault, Richard Bowden

Detecting People Using Mutually Consistent Poselet Activations

Bourdev and Malik (ICCV 09) introduced a new notion of parts, poselets, constructed to be tightly clustered both in the configuration space of keypoints, as well as in the appearance space of image patches. In this paper we develop a new algorithm for detecting people using poselets. Unlike that work which used 3D annotations of keypoints, we use only 2D annotations which are much easier for naive human annotators. The main algorithmic contribution is in how we use the pattern of poselet activations. Individual poselet activations are noisy, but considering the spatial context of each can provide vital disambiguating information, just as object detection can be improved by considering the detection scores of nearby objects in the scene. This can be done by training a two-layer feed-forward network with weights set using a max margin technique. The refined poselet activations are then clustered into mutually consistent hypotheses where consistency is based on empirically determined spatial keypoint distributions. Finally, bounding boxes are predicted for each person hypothesis and shape masks are aligned to edges in the image to provide a segmentation. To the best of our knowledge, the resulting system is the current best performer on the task of people detection and segmentation with an average precision of 47.8% and 40.5% respectively on PASCAL VOC 2009.

Lubomir Bourdev, Subhransu Maji, Thomas Brox, Jitendra Malik

Disparity Statistics for Pedestrian Detection: Combining Appearance, Motion and Stereo

Pedestrian detection is an important problem in computer vision due to its importance for applications such as visual surveillance, robotics, and automotive safety. This paper pushes the state-of-the-art of pedestrian detection in two ways. First, we propose a simple yet highly effective novel feature based on binocular disparity, outperforming previously proposed stereo features. Second, we show that the combination of different classifiers often improves performance even when classifiers are based on the same feature or feature combination. These two extensions result in significantly improved performance over the state-of-the-art on two challenging datasets.

Stefan Walk, Konrad Schindler, Bernt Schiele

Multi-stage Sampling with Boosting Cascades for Pedestrian Detection in Images and Videos

Many works address the problem of object detection by means of machine learning with boosted classifiers. They exploit sliding window search, spanning the whole image: the patches, at all possible positions and sizes, are sent to the classifier. Several methods have been proposed to speed up the search (adding complementary features or using specialized hardware). In this paper we propose a statistical-based search approach for object detection which uses a Monte Carlo sampling approach for estimating the likelihood density function with Gaussian kernels. The estimation relies on a multi-stage strategy where the proposal distribution is progressively refined by taking into account the feedback of the classifier (i.e. its response). For videos, this approach is plugged in a Bayesian-recursive framework which exploits the temporal coherency of the pedestrians. Several tests on both still images and videos on common datasets are provided in order to demonstrate the relevant speedup and the increased localization accuracy with respect to sliding window strategy using a pedestrian classifier based on covariance descriptors and a cascade of Logitboost classifiers.

Giovanni Gualdi, Andrea Prati, Rita Cucchiara

Learning to Detect Roads in High-Resolution Aerial Images

Reliably extracting information from aerial imagery is a difficult problem with many practical applications. One specific case of this problem is the task of automatically detecting roads. This task is a difficult vision problem because of occlusions, shadows, and a wide variety of non-road objects. Despite 30 years of work on automatic road detection, no automatic or semi-automatic road detection system is currently on the market and no published method has been shown to work reliably on large datasets of urban imagery. We propose detecting roads using a neural network with millions of trainable weights which looks at a much larger context than was used in previous attempts at learning the task. The network is trained on massive amounts of data using a consumer GPU. We demonstrate that predictive performance can be substantially improved by initializing the feature detectors using recently developed unsupervised learning methods as well as by taking advantage of the local spatial coherence of the output labels. We show that our method works reliably on two challenging urban datasets that are an order of magnitude larger than what was used to evaluate previous approaches.

Volodymyr Mnih, Geoffrey E. Hinton

Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry

In this paper we show that a geometric representation of an object occurring in indoor scenes, along with rich scene structure can be used to produce a detector for that object in a single image. Using perspective cues from the global scene geometry, we first develop a 3D based object detector. This detector is competitive with an image based detector built using state-of-the-art methods; however, combining the two produces a notably improved detector, because it unifies contextual and geometric information. We then use a probabilistic model that explicitly uses constraints imposed by spatial layout – the locations of walls and floor in the image – to refine the 3D object estimates. We use an existing approach to compute spatial layout [1], and use constraints such as objects are supported by floor and can not stick through the walls. The resulting detector (a) has significantly improved accuracy when compared to the state-of-the-art 2D detectors and (b) gives a 3D interpretation of the location of the object, derived from a 2D image. We evaluate the detector on beds, for which we give extensive quantitative results derived from images of real scenes.

Varsha Hedau, Derek Hoiem, David Forsyth

A Structural Filter Approach to Human Detection

Occlusions and articulated poses make human detection much more difficult than common more rigid object detection like face or car. In this paper, a Structural Filter (SF) approach to human detection is presented in order to deal with occlusions and articulated poses. A three-level hierarchical object structure consisting of words, sentences and paragraphs in analog to text grammar is proposed and correspondingly each level is associated to a kind of SF, that is, Word Structural Filter (WSF), Sentences Structural Filter (SSF) and Paragraph Structural Filter (PSF). A SF is a set of detectors which is able to infer what structures a test window possesses, and specifically WSF is composed of all detectors for words, SSF is composed of all detectors for sentences, and so as PSF. WSF works on the most basic units of an object. SSF deals with meaningful sub structures of an object. Visible parts of human in crowded scene can be head-shoulder, left-part, right-part, upper-body or whole-body, and articulated human change a lot in pose especially in doing sports. Visible parts and different poses are the appearance statuses of detected humans handled by PSF. The three levels of SFs, WSF, SSF and PSF, are integrated in an embedded structure to form a powerful classifier, named as Integrated Structural Filter (ISF). Detection experiments on pedestrian in highly crowded scenes and articulated human show the effectiveness and efficiency of our approach.

Genquan Duan, Haizhou Ai, Shihong Lao

Geometric Constraints for Human Detection in Aerial Imagery

In this paper, we propose a method for detecting humans in imagery taken from a UAV. This is a challenging problem due to small number of pixels on target, which makes it more difficult to distinguish people from background clutter, and results in much larger searchspace. We propose a method for human detection based on a number of geometric constraints obtained from the metadata. Specifically, we obtain the orientation of groundplane normal, the orientation of shadows cast by humans in the scene, and the relationship between human heights and the size of their corresponding shadows. In cases when metadata is not available we propose a method for automatically estimating shadow orientation from image data. We utilize the above information in a geometry based shadow, and human blob detector, which provides an initial estimation for locations of humans in the scene. These candidate locations are then classified as either human or clutter using a combination of wavelet features, and a Support Vector Machine. Our method works on a single frame, and unlike motion detection based methods, it bypasses the global motion compensation process, and allows for detection of stationary and slow moving humans, while avoiding the search across the entire image, which makes it more accurate and very fast. We show impressive results on sequences from the VIVID dataset and our own data, and provide comparative analysis.

Vladimir Reilly, Berkan Solmaz, Mubarak Shah

Handling Urban Location Recognition as a 2D Homothetic Problem

We address the problem of large scale place-of-interest recognition in cell phone images of urban scenarios. Here, we go beyond what has been shown in earlier approaches by exploiting the nowadays often available 3D building information (e.g. from extruded floor plans) and massive street-view like image data for database creation. Exploiting vanishing points in query images and thus fully removing 3D rotation from the recognition problem allows then to simplify the feature invariance to a pure homothetic problem, which we show leaves more discriminative power in feature descriptors than classical SIFT. We rerank visual word based document queries using a fast stratified homothetic verification that is tailored for repetitive patterns like window grids on facades and in most cases boosts the correct document to top positions if it was in the short list. Since we exploit 3D building information, the approach finally outputs the camera pose in real world coordinates ready for augmenting the cell phone image with virtual 3D information. The whole system is demonstrated to outperform traditional approaches on city scale experiments for different sources of street-view like image data and a challenging set of cell phone images.

Georges Baatz, Kevin Köser, David Chen, Radek Grzeszczuk, Marc Pollefeys

Recursive Coarse-to-Fine Localization for Fast Object Detection

Cascading techniques are commonly used to speed-up the scan of an image for object detection. However, cascades of detectors are slow to train due to the high number of detectors and corresponding thresholds to learn. Furthermore, they do not use any prior knowledge about the scene structure to decide where to focus the search. To handle these problems, we propose a new way to scan an image, where we couple a recursive coarse-to-fine refinement together with spatial constraints of the object location. For doing that we split an image into a set of uniformly distributed neighborhood regions, and for each of these we apply a local greedy search over feature resolutions. The neighborhood is defined as a scanning region that only one object can occupy. Therefore the best hypothesis is obtained as the location with maximum score and no thresholds are needed. We present an implementation of our method using a pyramid of HOG features and we evaluate it on two standard databases, VOC2007 and INRIA dataset. Results show that the Recursive Coarse-to-Fine Localization (RCFL) achieves a 12x speed-up compared to standard sliding windows. Compared with a cascade of multiple resolutions approach our method has slightly better performance in speed and Average-Precision. Furthermore, in contrast to cascading approach, the speed-up is independent of image conditions, the number of detected objects and clutter.

Marco Pedersoli, Jordi Gonzàlez, Andrew D. Bagdanov, Juan J. Villanueva

A Local Bag-of-Features Model for Large-Scale Object Retrieval

The so-called bag-of-features (BoF) representation for images is by now well-established in the context of large scale image and video retrieval. The BoF framework typically ranks database image according to a metric on the global histograms of the query and database images, respectively. Ranking based on global histograms has the advantage of being scalable with respect to the number of database images, but at the cost of reduced retrieval precision when the object of interest is small. Additionally, computationally intensive post-processing (such as RANSAC) is typically required to locate the object of interest in the retrieved images. To address these shortcomings, we propose a generalization of the global BoF framework to support scalable local matching. Specifically, we propose an efficient and accurate algorithm to accomplish local histogram matching and object localization simultaneously. The generalization is to represent each database image as a family of histograms that depend functionally on a bounding rectangle. Integral with the image retrieval process, we identify bounding rectangles whose histograms optimize query relevance, and rank the images accordingly. Through this localization scheme, we impose a weak spatial consistency constraint with low computational overhead. We validate our approach on two public image retrieval benchmarks: the University of Kentucky data set and the Oxford Building data set. Experiments show that our approach significantly improves on BoF-based retrieval, without requiring computationally expensive post-processing.

Zhe Lin, Jonathan Brandt

Velocity-Dependent Shutter Sequences for Motion Deblurring

We address the problem of high-quality image capture of fast-moving objects in moderate light environments. In such cases, the use of a traditional shutter is known to yield non-invertible motion blur due to the loss of certain spatial frequencies. We extend the flutter shutter method of Raskar et al. to fast-moving objects by first demonstrating that no coded exposure sequence yields an invertible point spread function for all velocities. Based on this, we argue that the shutter sequence must be dependent on object velocity, and propose a method for computing such velocity-dependent sequences. We demonstrate improved image quality from velocity-dependent sequences on fast-moving objects, as compared to sequences found using the existing sampling method.

Scott McCloskey

Colorization for Single Image Super Resolution

This paper introduces a new procedure to handle color in single image super resolution (SR). Most existing SR techniques focus primarily on enforcing image priors or synthesizing image details; less attention is paid to the final color assignment. As a result, many existing SR techniques exhibit some form of color aberration in the final upsampled image. In this paper, we outline a procedure based on image colorization and back-projection to perform color assignment guided by the super-resolution luminance channel. We have found that our procedure produces better results both quantitatively and qualitatively than existing approaches. In addition, our approach is generic and can be incorporated into any existing SR techniques.

Shuaicheng Liu, Michael S. Brown, Seon Joo Kim, Yu-Wing Tai

Programmable Aperture Camera Using LCoS

Since 1960s, aperture patterns have been studied extensively and a variety of coded apertures have been proposed for various applications, including extended depth of field, defocus deblurring, depth from defocus, light field acquisition, etc. Researches have shown that optimal aperture patterns can be quite different due to different applications, imaging conditions, or scene contents. In addition, many coded aperture techniques require aperture patterns to be temporally changed during capturing. As a result, it is often necessary to have a

programmable aperture camera

whose aperture pattern can be dynamically changed as needed in order to capture more useful information.

In this paper, we propose a programmable aperture camera using a Liquid Crystal on Silicon (LCoS) device. This design affords a high brightness contrast and high resolution aperture with a relatively low light loss, and enables one change the pattern at a reasonably high frame rate. We build a prototype camera and evaluate its features and drawbacks comprehensively by experiments. We also demonstrate two coded aperture applications in light field acquisition and defocus deblurring.

Hajime Nagahara, Changyin Zhou, Takuya Watanabe, Hiroshi Ishiguro, Shree K. Nayar

A New Algorithmic Approach for Contrast Enhancement

A novel algorithmic approach for optimal contrast enhancement is proposed. A measure of expected contrast and a sister measure of tone subtlety are defined for gray level transform functions. These definitions allow us to depart from the current practice of histogram equalization and formulate contrast enhancement as a problem of maximizing the expected contrast measure subject to a limit on tone distortion and possibly other constraints that suppress artifacts. The resulting contrast-tone optimization problem can be solved efficiently by linear programming. The proposed constrained optimization framework for contrast enhancement is general, and the user can add and fine tune the constraints to achieve desired visual effects. Experimental results demonstrate clearly superior performance of the new technique over histogram equalization.

Xiaolin Wu, Yong Zhao

Seeing through Obscure Glass

Obscure glass is textured glass designed to separate spaces and “obscure” visibility between the spaces. Such glass is used to provide privacy while still allowing light to flow into a space, and is often found in homes and offices. We propose and explore the challenge of “seeing through” obscure glass, using both optical and digital techniques. In some cases – such as when the textured surface is on the side of the observer – we find that simple household substances and cameras with small apertures enable a surprising level of visibility through the obscure glass. In other cases, where optical techniques are not usable, we find that we can model the action of obscure glass as convolution of spatially varying kernels and reconstruct an image of the scene on the opposite side of the obscure glass with surprising detail.

Qi Shan, Brian Curless, Tadayoshi Kohno

A Continuous Max-Flow Approach to Potts Model

We address the continuous problem of assigning multiple (unordered) labels with the minimum perimeter. The corresponding discrete Potts model is typically addressed with a-expansion which can generate metrication artifacts. Existing convex continuous formulations of the Potts model use TV-based functionals directly encoding perimeter costs. Such formulations are analogous to ’min-cut’ problems on graphs. We propose a novel convex formulation with a continous ’max-flow’ functional. This approach is dual to the standard TV-based formulations of the Potts model. Our continous max-flow approach has significant numerical advantages; it avoids extra computational load in enforcing the simplex constraints and naturally allows parallel computations over different labels. Numerical experiments show competitive performance in terms of quality and significantly reduced number of iterations compared to the previous state of the art convex methods for the continuous Potts model.

Jing Yuan, Egil Bae, Xue-Cheng Tai, Yuri Boykov

Hybrid Compressive Sampling via a New Total Variation TVL1

Compressive sampling (CS) is aimed at acquiring a signal or image from data which is deemed insufficient by Nyquist/Shannon sampling theorem. Its main idea is to recover a signal from limited measurements by exploring the prior knowledge that the signal is sparse or compressible in some domain. In this paper, we propose a CS approach using a new total-variation measure TVL1, or equivalently TV

$_{\ell_1}$

, which enforces the sparsity and the directional continuity in the gradient domain. Our TV

$_{\ell_1}$

based CS is characterized by the following attributes. First, by minimizing the ℓ

-norm of partial gradients, it can achieve greater accuracy than the widely-used TV

$_{\ell_1\ell_2}$

based CS. Second, it, named hybrid CS, combines low-resolution sampling (LRS) and random sampling (RS), which is motivated by our induction that these two sampling methods are complementary. Finally, our theoretical and experimental results demonstrate that our hybrid CS using TV

$_{\ell_1}$

yields sharper and more accurate images.

Xianbiao Shu, Narendra Ahuja

Perspective Imaging under Structured Light

Traditionally, “Structured Light” has been used to recover surface topology and estimate depth maps. A more recent development is the use of “Structured Light” in surpassing the fundamental limit on spatial resolution imposed by diffraction. But, its use in surpassing the diffraction limit remains confined to microscopy, due to issues that arise in macroscopic imaging: perspective foreshortening, aliasing and need for calibration. Also, no formal attempt has been made to unify the above embodiments, despite their common reliance on “Structured Light”.

An original contribution of this work is the use of “Structured Light” in surpassing the diffraction limit of macroscopic imaging systems. Other contributions include

unifying the “Structured Light” embodiments in a single framework

realizing

OSR

and depth-estimation in a single un-calibrated setup

when the image planes of the imaging

illumination system are parallel.

Potential applications include bar code scanning and surveillance.

Prasanna Rangarajan, Vikrant Bhakta, Marc Christensen, Panos Papamichalis

Lighting and Pose Robust Face Sketch Synthesis

Automatic face sketch synthesis has important applications in law enforcement and digital entertainment. Although great progress has been made in recent years, previous methods only work under well controlled conditions and often fail when there are variations of lighting and pose. In this paper, we propose a robust algorithm for synthesizing a face sketch from a face photo taken under a different lighting condition and in a different pose than the training set. It synthesizes local sketch patches using a multiscale Markov Random Field (MRF) model. The robustness to lighting and pose variations is achieved in three steps. Firstly, shape priors specific to facial components are introduced to reduce artifacts and distortions caused by variations of lighting and pose. Secondly, new patch descriptors and metrics which are more robust to lighting variations are used to find candidates of sketch patches given a photo patch. Lastly, a smoothing term measuring both intensity compatibility and gradient compatibility is used to match neighboring sketch patches on the MRF network more effectively. The proposed approach significantly improves the performance of the state-of-the-art method. Its effectiveness is shown through experiments on the CUHK face sketch database and celebrity photos collected from the web.

Wei Zhang, Xiaogang Wang, Xiaoou Tang

Predicting Facial Beauty without Landmarks

A fundamental task in artificial intelligence and computer vision is to build machines that can behave like a human in recognizing a broad range of visual concepts. This paper aims to investigate and develop intelligent systems for learning the concept of

female facial beauty

and producing human-like predictors. Artists and social scientists have long been fascinated by the notion of facial beauty, but study by computer scientists has only begun in the last few years. Our work is notably different from and goes beyond previous works in several aspects: 1) we focus on

fully-automatic

learning approaches that do not require costly manual annotation of landmark facial features but simply take the raw pixels as inputs; 2) our study is based on a collection of data that is an order of magnitude larger than that of any previous study; 3) we imposed no restrictions in terms of pose, lighting, background, expression, age, and ethnicity on the face images used for training and testing. These factors significantly increased the difficulty of the learning task. We show that a biologically-inspired model with multiple layers of trainable feature extractors can produce results that are much more human-like than the previously used eigenface approach. Finally, we develop a novel visualization method to interpret the learned model and revealed the existence of several beautiful features that go beyond the current averageness and symmetry hypotheses.

Douglas Gray, Kai Yu, Wei Xu, Yihong Gong

Gabor Feature Based Sparse Representation for Face Recognition with Gabor Occlusion Dictionary

By coding the input testing image as a sparse linear combination of the training samples via

-norm minimization, sparse representation based classification (SRC) has been recently successfully used for face recognition (FR). Particularly, by introducing an identity occlusion dictionary to sparsely code the occluded portions in face images, SRC can lead to robust FR results against occlusion. However, the large amount of atoms in the occlusion dictionary makes the sparse coding computationally very expensive. In this paper, the image Gabor-features are used for SRC. The use of Gabor kernels makes the occlusion dictionary compressible, and a Gabor occlusion dictionary computing algorithm is then presented. The number of atoms is significantly reduced in the computed Gabor occlusion dictionary, which greatly reduces the computational cost in coding the occluded face images while improving greatly the SRC accuracy. Experiments on representative face databases with variations of lighting, expression, pose and occlusion demonstrated the effectiveness of the proposed Gabor-feature based SRC (GSRC) scheme.

Meng Yang, Lei Zhang

Motion Profiles for Deception Detection Using Visual Cues

We propose a data-driven, unobtrusive and covert method for automatic deception detection in interrogation interviews from visual cues only. Using skin blob analysis together with Active Shape Modeling, we continuously track and analyze the motion of the hands and head as a subject is responding to interview questions, as well as their facial micro expressions, thus extracting

motion profiles

, which we aggregate over each interview response. Our novelty lies in the representation of the motion profile distribution for each response. In particular, we use a kernel density estimator with uniform bins in log feature space. This scheme allows the representation of relatively over-controlled and relatively agitated behaviors of interviewed subjects, thus aiding in the discrimination of truthful and deceptive responses.

Nicholas Michael, Mark Dilsizian, Dimitris Metaxas, Judee K. Burgoon

A Robust and Scalable Approach to Face Identification

The problem of face identification has received significant attention over the years. For a given probe face, the goal of face identification is to match this unknown face against a gallery of known people. Due to the availability of large amounts of data acquired in a variety of conditions, techniques that are both robust to uncontrolled acquisition conditions and scalable to large gallery sizes, which may need to be incrementally built, are challenges. In this work we tackle both problems. Initially, we propose a novel approach to robust face identification based on Partial Least Squares (PLS) to perform multi-channel feature weighting. Then, we extend the method to a tree-based discriminative structure aiming at reducing the time required to evaluate novel probe samples. The method is evaluated through experiments on FERET and FRGC datasets. In most of the comparisons our method outperforms state-of-art face identification techniques. Furthermore, our method presents scalability to large datasets.

William Robson Schwartz, Huimin Guo, Larry S. Davis

Emotion Recognition from Arbitrary View Facial Images

Emotion recognition from facial images is a very active research topic in human computer interaction (HCI). However, most of the previous approaches only focus on the frontal or nearly frontal view facial images. In contrast to the frontal/nearly-frontal view images, emotion recognition from non-frontal view or even arbitrary view facial images is much more difficult yet of more practical utility. To handle the emotion recognition problem from arbitrary view facial images, in this paper we propose a novel method based on the regional covariance matrix (RCM) representation of facial images. We also develop a new discriminant analysis theory, aiming at reducing the dimensionality of the facial feature vectors while preserving the most discriminative information, by minimizing an estimated multiclass Bayes error derived under the Gaussian mixture model (GMM). We further propose an efficient algorithm to solve the optimal discriminant vectors of the proposed discriminant analysis method. We render thousands of multi-view 2D facial images from the BU-3DFE database and conduct extensive experiments on the generated database to demonstrate the effectiveness of the proposed method. It is worth noting that our method does not require face alignment or facial landmark points localization, making it very attractive.

Wenming Zheng, Hao Tang, Zhouchen Lin, Thomas S. Huang

Face Liveness Detection from a Single Image with Sparse Low Rank Bilinear Discriminative Model

Spoofing with photograph or video is one of the most common manner to circumvent a face recognition system. In this paper, we present a real-time and non-intrusive method to address this based on individual images from a generic webcamera. The task is formulated as a binary classification problem, in which, however, the distribution of positive and negative are largely overlapping in the input space, and a suitable representation space is hence of importance. Using the Lambertian model, we propose two strategies to extract the essential information about different surface properties of a live human face or a photograph, in terms of latent samples. Based on these, we develop two new extensions to the sparse logistic regression model which allow quick and accurate spoof detection. Primary experiments on a large photo imposter database show that the proposed method gives preferable detection performance compared to others.

Xiaoyang Tan, Yi Li, Jun Liu, Lin Jiang

Robust Head Pose Estimation Using Supervised Manifold Learning

We address the problem of fine-grain head pose angle estimation from a single 2D face image as a continuous regression problem. Currently the state of the art, and a promising line of research, on head pose estimation seems to be that of nonlinear manifold embedding techniques, which learn an ”optimal” low-dimensional manifold that models the nonlinear and continuous variation of face appearance with pose angle. Furthermore,

supervised

manifold learning techniques attempt to achieve this robustly in the presence of latent variables in the training set (especially identity, illumination, and facial expression), by incorporating head pose angle information accompanying the training samples. Most of these techniques are designed with the classification scenario in mind, however, and are not directly applicable to the regression scenario where continuous numeric values (pose angles), rather than class labels (discrete poses), are available. In this paper, we propose to deal with the regression case in a principled way. We present a taxonomy of methods for incorporating continuous pose angle information into one or more stages of the manifold learning process, and discuss its implementation for Neighborhood Preserving Embedding (NPE) and Locality Preserving Projection (LPP). Experiments are carried out on a face dataset containing significant identity and illumination variations, and the results show that our regression-based approach far outperforms previous supervised manifold learning methods for head pose estimation.

Chiraz BenAbdelkader

Knowledge Based Activity Recognition with Dynamic Bayesian Network

In this paper, we propose solutions on learning dynamic Bayesian network (DBN) with domain knowledge for human activity recognition. Different types of domain knowledge, in terms of first order probabilistic logics (FOPLs), are exploited to guide the DBN learning process. The FOPLs are transformed into two types of model priors: structure prior and parameter constraints. We present a structure learning algorithm, constrained structural EM (CSEM), on learning the model structures combining the training data with these priors. Our method successfully alleviates the common problem of lack of sufficient training data in activity recognition. The experimental results demonstrate simple logic knowledge can compensate effectively for the shortage of the training data and therefore reduce our dependencies on training data.

Zhi Zeng, Qiang Ji

View and Style-Independent Action Manifolds for Human Activity Recognition

We introduce a novel approach to automatically learn intuitive and compact descriptors of human body motions for activity recognition. Each action descriptor is produced, first, by applying Temporal Laplacian Eigenmaps to view-dependent videos in order to produce a stylistic invariant embedded manifold for each view separately. Then, all view-dependent manifolds are automatically combined to discover a unified representation which model in a single three dimensional space an action independently from style and viewpoint. In addition, a bidirectional nonlinear mapping function is incorporated to allow projecting actions between original and embedded spaces. The proposed framework is evaluated on a real and challenging dataset (IXMAS), which is composed of a variety of actions seen from arbitrary viewpoints. Experimental results demonstrate robustness against style and view variation and match the most accurate action recognition method.

Michał Lewandowski, Dimitrios Makris, Jean-Christophe Nebel

Figure-Ground Image Segmentation Helps Weakly-Supervised Learning of Objects

Given a collection of images containing a common object, we seek to learn a model for the object without the use of bounding boxes or segmentation masks. In linguistics, a single document provides no information about location of the topics it contains. On the contrary, an image has a lot to tell us about where foreground and background topics lie. Extensive literature on modelling bottom-up saliency and pop-out aims at predicting eye fixations and allocation of visual attention in a single image, prior to any recognition of content. Most salient image parts are likely to capture image foreground. We propose a novel probabilistic model,

shape and figure-ground aware model

(sFG model) that exploits bottom-up image saliency to compute an informative prior on segment topic assignments. xtitsegmented objects into visually object classes. (ii) bottom up saliency combined with co-occurrence give us strong hints about figure/ground that can help guide the topic discovery from

partially segmented

data. Our model exploits both figure-ground organization in each image separately, as well as feature re-occurrence across the image collection. Since we use image dependent topic prior, during model learning we optimize a

conditional

likelihood of the image collection given the image bottom-up saliency information. Our discriminative framework can tolerate larger intraclass variability of objects with fewer training data. We iterate between bottom-up figure-ground image organization and model parameter learning by accumulating image statistics from the entire image collection. The model learned influences later image figure-ground labelling. We present results of our approach on diverse datasets showing great improvement over generative probabilistic models that do not exploit image saliency, indicating the suitability of our model for weakly-supervised visual organization.

Katerina Fragkiadaki, Jianbo Shi

Enhancing Interactive Image Segmentation with Automatic Label Set Augmentation

We address the problem of having insufficient labels in an interactive image segmentation framework, for which most current methods would fail without further user interaction. To minimize user interaction, we use the appearance and boundary information synergistically. Specifically, we perform distribution propagation on the image graph constructed with color features to derive an initial estimate of the segment labels. Following that, we include automatically estimated segment distributions at “critical pixels” with uncertain labels to improve the segmentation performance. Such estimation is realized by incorporating boundary information using a non-parametric Dirichlet process for modeling diffusion signatures derived from the salient boundaries. Our main contribution is fusion of image appearance with probabilistic modeling of boundary information to segment the whole-object with a limited number of labeled pixels. Our proposed framework is extensively tested on a standard dataset, and is shown to achieve promising results both quantitatively and qualitatively.

Lei Ding, Alper Yilmaz

Hough Transform and 3D SURF for Robust Three Dimensional Classification

Most methods for the recognition of shape classes from 3D datasets focus on classifying clean, often manually generated models. However, 3D shapes obtained through acquisition techniques such as Structure-from-Motion or LIDAR scanning are noisy, clutter and holes. In that case global shape features—still dominating the 3D shape class recognition literature—are less appropriate. Inspired by 2D methods, recently researchers have started to work with local features. In keeping with this strand, we propose a new robust 3D shape classification method. It contains two main contributions. First, we extend a robust 2D feature descriptor, SURF, to be used in the context of 3D shapes. Second, we show how 3D shape class recognition can be improved by probabilistic Hough transform based methods, already popular in 2D. Through our experiments on partial shape retrieval, we show the power of the proposed 3D features. Their combination with the Hough transform yields superior results for class recognition on standard datasets. The potential for the applicability of such a method in classifying 3D obtained from Structure-from-Motion methods is promising, as we show in some initial experiments.

Jan Knopp, Mukta Prasad, Geert Willems, Radu Timofte, Luc Van Gool

Backmatter

Titel: Computer Vision – ECCV 2010
herausgegeben von: Kostas Daniilidis
Petros Maragos
Nikos Paragios
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-15567-3
Print ISBN: 978-3-642-15566-6
DOI: https://doi.org/10.1007/978-3-642-15567-3