nach oben

2012 | Buch

Kapitel lesen Erstes Kapitel lesen

Computer Vision – ECCV 2012

12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VII

herausgegeben von: Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, Cordelia Schmid

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The seven-volume set comprising LNCS volumes 7572-7578 constitutes the refereed proceedings of the 12th European Conference on Computer Vision, ECCV 2012, held in Florence, Italy, in October 2012. The 408 revised papers presented were carefully reviewed and selected from 1437 submissions. The papers are organized in topical sections on geometry, 2D and 3D shapes, 3D reconstruction, visual recognition and classification, visual features and image matching, visual monitoring: action and activities, models, optimisation, learning, visual tracking and image registration, photometry: lighting and colour, and image segmentation.

Inhaltsverzeichnis

Frontmatter

Poster Session 8

Local Higher-Order Statistics (LHS) for Texture Categorization and Facial Analysis

This paper proposes a new image representation for texture categorization and facial analysis, relying on the use of higher-order local differential statistics as features. In contrast with models based on the global structure of textures and faces, it has been shown recently that small local pixel pattern distributions can be highly discriminative. Motivated by such works, the proposed model employs higher-order statistics of local non-binarized pixel patterns for the image description. Hence, in addition to being remarkably simple, it requires neither any user specified quantization of the space (of pixel patterns) nor any heuristics for discarding low occupancy volumes of the space. This leads to a more expressive representation which, when combined with discriminative SVM classifier, consistently achieves state-of-the-art performance on challenging texture and facial analysis datasets outperforming contemporary methods (with similar powerful classifiers).

Gaurav Sharma, Sibt ul Hussain, Frédéric Jurie

SEEDS: Superpixels Extracted via Energy-Driven Sampling

Superpixel algorithms aim to over-segment the image by grouping pixels that belong to the same object. Many state-of-the-art superpixel algorithms rely on minimizing objective functions to enforce color homogeneity. The optimization is accomplished by sophisticated methods that progressively build the superpixels, typically by adding cuts or growing superpixels. As a result, they are computationally too expensive for real-time applications. We introduce a new approach based on a simple hill-climbing optimization. Starting from an initial superpixel partitioning, it continuously refines the superpixels by modifying the boundaries. We define a robust and fast to evaluate energy function, based on enforcing color similarity between the boundaries and the superpixel color histogram. In a series of experiments, we show that we achieve an excellent compromise between accuracy and efficiency. We are able to achieve a performance comparable to the state-of-the-art, but in real-time on a single Intel i7 CPU at 2.8GHz.

Michael Van den Bergh, Xavier Boix, Gemma Roig, Benjamin de Capitani, Luc Van Gool

Recording and Playback of Camera Shake: Benchmarking Blind Deconvolution with a Real-World Database

Motion blur due to camera shake is one of the predominant sources of degradation in handheld photography. Single image blind deconvolution (BD) or motion deblurring aims at restoring a sharp latent image from the blurred recorded picture without knowing the camera motion that took place during the exposure. BD is a long-standing problem, but has attracted much attention recently, cumulating in several algorithms able to restore photos degraded by real camera motion in high quality. In this paper, we present a

benchmark dataset

for motion deblurring that allows quantitative performance evaluation and comparison of recent approaches featuring non-uniform blur models. To this end, we

record and analyse real camera motion

, which is played back on a robot platform such that we can record a sequence of sharp images sampling the six dimensional camera motion trajectory. The goal of deblurring is to recover one of these sharp images, and our dataset contains all information to assess how closely various algorithms approximate that goal. In a comprehensive comparison, we evaluate state-of-the-art single image BD algorithms incorporating uniform and non-uniform blur models.

Rolf Köhler, Michael Hirsch, Betty Mohler, Bernhard Schölkopf, Stefan Harmeling

Learning-Based Symmetry Detection in Natural Images

In this work we propose a learning-based approach to symmetry detection in natural images. We focus on ribbon-like structures, i.e. contours marking local and approximate reflection symmetry and make three contributions to improve their detection. First, we create and make publicly available a ground-truth dataset for this task by building on the Berkeley Segmentation Dataset. Second, we extract features representing multiple complementary cues, such as grayscale structure, color, texture, and spectral clustering information. Third, we use supervised learning to

learn

how to combine these cues, and employ MIL to accommodate the unknown scale and orientation of the symmetric structures. We systematically evaluate the performance contribution of each individual component in our pipeline, and demonstrate that overall we consistently improve upon results obtained using existing alternatives.

Stavros Tsogkas, Iasonas Kokkinos

Similarity Constrained Latent Support Vector Machine: An Application to Weakly Supervised Action Classification

We present a novel algorithm for weakly supervised action classification in videos. We assume we are given training videos annotated only with action class labels. We learn a model that can classify unseen test videos, as well as localize a region of interest in the video that captures the discriminative essence of the action class. A novel Similarity Constrained Latent Support Vector Machine model is developed to operationalize this goal. This model specifies that videos should be classified correctly, and that the latent regions of interest chosen should be coherent over videos of an action class. The resulting learning problem is challenging, and we show how dual decomposition can be employed to render it tractable. Experimental results demonstrate the efficacy of the method.

Nataliya Shapovalova, Arash Vahdat, Kevin Cannons, Tian Lan, Greg Mori

Team Activity Recognition in Sports

We introduce a novel approach for team activity recognition in sports. Given the positions of team players from a plan view of the playing field at any given time, we solve a particular Poisson equation to generate a smooth distribution defined on whole playground, termed the position distribution of the team. Computing the position distribution for each frame provides a sequence of distributions, which we process to extract motion features for team activity recognition. The motion features are obtained at each frame using frame differencing and optical flow. We investigate the use of the proposed motion descriptors with Support Vector Machines (SVM) classification, and evaluate on a publicly available European handball dataset. Results show that our approach can classify six different team activities and performs better than a method that extracts features from the explicitly defined positions. Our method is new and different from other trajectory-based methods. These methods extract activity features using the explicitly defined trajectories, where the players have specific positions at any given time, and ignore the rest of the playground. In our work, on the other hand, given the specific positions of the team players at a frame, we construct a position distribution for the team on the whole playground and process the sequence of position distribution images to extract motion features for activity recognition. Results show that our approach is effective.

Cem Direkoǧlu, Noel E. O’Connor

Space-Variant Descriptor Sampling for Action Recognition Based on Saliency and Eye Movements

Algorithms using “bag of features”-style video representations currently achieve state-of-the-art performance on action recognition tasks, such as the challenging Hollywood2 benchmark [1,2,3]. These algorithms are based on local spatiotemporal descriptors that can be extracted either sparsely (at interest points) or densely (on regular grids), with dense sampling typically leading to the best performance [1]. Here, we investigate the benefit of space-variant processing of inputs, inspired by attentional mechanisms in the human visual system. We employ saliency-mapping algorithms to find informative regions and descriptors corresponding to these regions are either used exclusively, or are given greater representational weight (additional codebook vectors). This approach is evaluated with three state-of-the-art action recognition algorithms [1,2,3], and using several saliency algorithms. We also use saliency maps derived from human eye movements to probe the limits of the approach. Saliency-based pruning allows up to 70% of descriptors to be discarded, while maintaining high performance on Hollywood2. Meanwhile, pruning of 20-50% (depending on model) can even improve recognition. Further improvements can be obtained by combining representations learned separately on salience-pruned and unpruned descriptor sets. Not surprisingly, using the human eye movement data gives the best mean Average Precision (mAP; 61.9%), providing an upper bound on what is possible with a high-quality saliency map. Even without such external data, the Dense Trajectories model [1] enhanced by automated saliency-based descriptor sampling achieves the best mAP (60.0%) reported on Hollywood2 to date.

Eleonora Vig, Michael Dorr, David Cox

Dynamic Probabilistic CCA for Analysis of Affective Behaviour

Fusing multiple continuous expert annotations is a crucial problem in machine learning and computer vision, particularly when dealing with uncertain and subjective tasks related to affective behaviour. Inspired by the concept of inferring shared and individual latent spaces in probabilistic CCA (PCCA), we firstly propose a novel, generative model which discovers temporal dependencies on the shared/individual spaces (DPCCA). In order to accommodate for temporal lags which are prominent amongst continuous annotations, we further introduce a latent warping process. We show that the resulting model (DPCTW) (i) can be used as a unifying framework for solving the problems of temporal alignment and fusion of multiple annotations in time, and (ii) that by incorporating dynamics, modelling annotation/sequence specific biases, noise estimation and time warping, DPCTW outperforms state-of-the-art methods for both the aggregation of multiple, yet imperfect expert annotations as well as the alignment of affective behavior.

Mihalis A. Nicolaou, Vladimir Pavlovic, Maja Pantic

Loss-Specific Training of Non-Parametric Image Restoration Models: A New State of the Art

After a decade of rapid progress in image denoising, recent methods seem to have reached a performance limit. Nonetheless, we find that state-of-the-art denoising methods are visually clearly distinguishable and possess complementary strengths and failure modes. Motivated by this observation, we introduce a powerful non-parametric image restoration framework based on Regression Tree Fields (RTF). Our restoration model is a densely-connected tractable conditional random field that leverages existing methods to produce an image-dependent, globally consistent prediction. We estimate the conditional structure and parameters of our model from training data so as to directly optimize for popular performance measures. In terms of peak signal-to-noise-ratio (PSNR), our model improves on the best published denoising method by at least 0.26dB across a range of noise levels. Our most practical variant still yields statistically significant improvements, yet is over 20× faster than the strongest competitor. Our approach is well-suited for many more image restoration and low-level vision problems, as evidenced by substantial gains in tasks such as removal of JPEG blocking artefacts.

Jeremy Jancsary, Sebastian Nowozin, Carsten Rother

A Probabilistic Approach to Robust Matrix Factorization

Matrix factorization underlies a large variety of computer vision applications. It is a particularly challenging problem for large-scale applications and when there exist outliers and missing data. In this paper, we propose a novel probabilistic model called Probabilistic Robust Matrix Factorization (PRMF) to solve this problem. In particular, PRMF is formulated with a Laplace error and a Gaussian prior which correspond to an ℓ

loss and an ℓ

regularizer, respectively. For model learning, we devise a parallelizable expectation-maximization (EM) algorithm which can potentially be applied to large-scale applications. We also propose an online extension of the algorithm for sequential data to offer further scalability. Experiments conducted on both synthetic data and some practical computer vision applications show that PRMF is comparable to other state-of-the-art robust matrix factorization methods in terms of accuracy and outperforms them particularly for large data matrices.

Naiyan Wang, Tiansheng Yao, Jingdong Wang, Dit-Yan Yeung

Fast Parameter Sensitivity Analysis of PDE-Based Image Processing Methods

We present a fast parameter sensitivity analysis by combining recent developments from uncertainty quantification with image processing operators. The approach is not based on a sampling strategy, instead we combine the polynomial chaos expansion and stochastic finite elements with PDE-based image processing operators. With our approach and a moderate number of parameters in the models the full sensitivity analysis is obtained at the cost of a few Monte Carlo runs. To demonstrate the efficiency and simplicity of the approach we show a parameter sensitivity analysis for Perona-Malik diffusion, random walker and Ambrosio-Tortorelli segmentation, and discontinuity-preserving optical flow computation.

Torben Pätz, Tobias Preusser

The Lazy Flipper: Efficient Depth-Limited Exhaustive Search in Discrete Graphical Models

We propose a new exhaustive search algorithm for optimization in discrete graphical models. When pursued to the full search depth (typically intractable), it is guaranteed to converge to a global optimum, passing through a series of monotonously improving local optima that are guaranteed to be optimal within a given and increasing Hamming distance. For a search depth of 1, it specializes to ICM. Between these extremes, a tradeoff between approximation quality and runtime is established. We show this experimentally by improving approximations for the non-submodular models in the MRF benchmark [1] and Decision Tree Fields [2].

Bjoern Andres, Jörg H. Kappes, Thorsten Beier, Ullrich Köthe, Fred A. Hamprecht

Face Association across Unconstrained Video Frames Using Conditional Random Fields

Automatic face association across unconstrained video frames has many practical applications. Recent advances in the area of object detection have made it possible to replace the traditional tracking-based association approaches with the more robust detection-based ones. However, it is still a very challenging task for real-world unconstrained videos, especially if the subjects are in a moving platform and at distances exceeding several tens of meters. In this paper, we present a novel solution based on a Conditional Random Field (CRF) framework. The CRF approach not only gives a probabilistic and systematic treatment of the problem, but also elegantly combines global and local features. When ambiguities in labels cannot be solved by using the face appearance alone, our method relies on multiple contextual features to provide further evidence for association. Our algorithm works in an on-line mode and is able to reliably handle real-world videos. Results of experiments using challenging video data and comparisons with other methods are provided to demonstrate the effectiveness of our method.

Ming Du, Rama Chellappa

Contraction Moves for Geometric Model Fitting

This paper presents a new class of moves, called

α-expansion-contraction

, which generalizes

-expansion graph cuts for multi-label energy minimization problems. The new moves are particularly useful for optimizing the assignments in model fitting frameworks whose energies include

Label Cost

(LC), as well as

Markov Random Field

(MRF) terms. These problems benefit from the contraction moves’ greater scope for removing instances from the model, reducing label costs. We demonstrate this effect on the problem of fitting sets of geometric primitives to point cloud data, including real-world point clouds containing millions of points, obtained by multi-view reconstruction.

Oliver J. Woodford, Minh-Tri Pham, Atsuto Maki, Riccardo Gherardi, Frank Perbet, Björn Stenger

General and Nested Wiberg Minimization: L 2 and Maximum Likelihood

Wiberg matrix factorization breaks a matrix

into low-rank factors

and

by solving for

in closed form given

, linearizing

(

) about

, and iteratively minimizing ||

−

(

)||

with respect to

only. This approach factors the matrix while effectively removing

from the minimization. We generalize the Wiberg approach beyond factorization to minimize an arbitrary function that is nonlinear in each of two sets of variables. In this paper we focus on the case of

minimization and maximum likelihood estimation (MLE), presenting an

Wiberg bundle adjustment algorithm and a Wiberg MLE algorithm for Poisson matrix factorization. We also show that one Wiberg minimization can be nested inside another, effectively removing two of three sets of variables from a minimization. We demonstrate this idea with a nested Wiberg algorithm for

projective bundle adjustment, solving for camera matrices, points, and projective depths.

Dennis Strelow

Nonmetric Priors for Continuous Multilabel Optimization

We propose a novel convex prior for multilabel optimization which allows to impose arbitrary distances between labels. Only symmetry,

(

) ≥ 0 and

(

) = 0 are required. In contrast to previous grid based approaches for the nonmetric case, the proposed prior is formulated in the continuous setting avoiding grid artifacts. In particular, the model is easy to implement, provides a convex relaxation for the Mumford-Shah functional and yields comparable or superior results on the MSRC segmentation database comparing to metric or grid based approaches.

Evgeny Strekalovskiy, Claudia Nieuwenhuis, Daniel Cremers

Real-Time Camera Tracking: When is High Frame-Rate Best?

Higher frame-rates promise better tracking of rapid motion, but advanced real-time vision systems rarely exceed the standard 10–60Hz range, arguing that the computation required would be too great. Actually, increasing frame-rate is mitigated by reduced computational cost

per frame

in trackers which take advantage of prediction. Additionally, when we consider the physics of image formation, high frame-rate implies that the upper bound on shutter time is reduced, leading to less motion blur but more noise. So, putting these factors together, how are application-dependent performance requirements of accuracy, robustness and computational cost optimised as frame-rate varies? Using 3D camera tracking as our test problem, and analysing a fundamental dense whole image alignment approach, we open up a route to a systematic investigation via the careful synthesis of photorealistic video using ray-tracing of a detailed 3D scene, experimentally obtained photometric response and noise models, and rapid camera motions. Our multi-frame-rate, multi-resolution, multi-light-level dataset is based on tens of thousands of hours of CPU rendering time. Our experiments lead to quantitative conclusions about frame-rate selection and highlight the crucial role of full consideration of physical image formation in pushing tracking performance.

Ankur Handa, Richard A. Newcombe, Adrien Angeli, Andrew J. Davison

A Bayesian Approach to Alignment-Based Image Hallucination

In most image hallucination work, a strong assumption is held that images can be aligned to a template on which the prior of high-res images is formulated and learned. Realizing that one template can hardly generalize to all images of an object such as faces due to pose and viewpoint variation as well as occlusion, we propose an example-based prior distribution via dense image correspondences. We introduce a Bayesian formulation based on an image prior that can implement different effective behaviors based on the value of a single parameter. Using faces as examples, we show that our system outperforms the prior state of art.

Marshall F. Tappen, Ce Liu

Continuous Regression for Non-rigid Image Alignment

Parameterized Appearance Models (PAMs) such as Active Appearance Models (AAMs), Morphable Models and Boosted Appearance Models have been extensively used for face alignment. Broadly speaking, PAMs methods can be classified into generative and discriminative. Discriminative methods learn a mapping between appearance features and motion parameters (rigid and non-rigid). While discriminative approaches have some advantages (e.g., feature weighting, improved generalization), they suffer from two major drawbacks: (1) they need large amounts of perturbed samples to train a regressor or classifier, making the training process computationally expensive in space and time. (2) It is not practical to uniformly sample the space of motion parameters. In practice, there are regions of the motion space that are more densely sampled than others, resulting in biased models and lack of generalization. To solve these problems, this paper proposes a computationally efficient continuous regressor that does not require the sampling stage. Experiments on real data show the improvement in memory and time requirements to train a discriminative appearance model, as well as improved generalization.

Enrique Sánchez-Lozano, Fernando De la Torre, Daniel González-Jiménez

Non-rigid Shape Registration: A Single Linear Least Squares Framework

This paper proposes a non-rigid registration formulation capturing both global and local deformations in a single framework. This formulation is based on a quadratic estimation of the registration distance together with a quadratic regularization term. Hence, the optimal transformation parameters are easily obtained by solving a liner system of equations, which guarantee a fast convergence. Experimental results with challenging 2D and 3D shapes are presented to show the validity of the proposed framework. Furthermore, comparisons with the most relevant approaches are provided.

Mohammad Rouhani, Angel D. Sappa

Robust and Accurate Shape Model Fitting Using Random Forest Regression Voting

A widely used approach for locating points on deformable objects is to generate feature response images for each point, then to fit a shape model to the response images. We demonstrate that Random Forest regression can be used to generate high quality response images quickly. Rather than using a generative or a discriminative model to evaluate each pixel, a regressor is used to cast votes for the optimal position. We show this leads to fast and accurate matching when combined with a statistical shape model. We evaluate the technique in detail, and compare with a range of commonly used alternatives on several different datasets. We show that the random forest regression method is significantly faster and more accurate than equivalent discriminative, or boosted regression based methods trained on the same data.

Tim F. Cootes, Mircea C. Ionita, Claudia Lindner, Patrick Sauer

Shape from Fluorescence

Beyond day glow highlighters and psychedelic black light posters, it has been estimated that fluorescence is a property exhibited by 20% of objects. When a fluorescent material is illuminated with a short wavelength light, it re-emits light at a longer wavelength isotropically in a similar manner as a Lambertian surface reflects light. This hitherto neglected property opens the doors to using fluorescence to reconstruct 3D shape with some of the same techniques as for Lambertian surfaces – even when the surface’s reflectance is highly non-Lambertian. Thus, performing reconstruction using fluorescence has advantages over purely Lambertian surfaces. Single image shape-from-shading and calibrated Lambertian photometric stereo can be applied to fluorescence images to reveal 3D shape. When performing uncalibrated photometric stereo, both fluorescence and reflectance can be used to recover Euclidean shape and resolve the generalized bas relief ambiguity. Finally for objects that fluoresce in wavelengths distinct from their reflectance (such as plants and vegetables), reconstructions do not suffer from problems due to inter-reflections. We validate these claims through experiments.

Tali Treibitz, Zak Murez, B. Greg Mitchell, David Kriegman

Separability Oriented Preprocessing for Illumination-Insensitive Face Recognition

In the last decade, some illumination preprocessing approaches were proposed to eliminate the lighting variation in face images for lighting-invariant face recognition. However, we find surprisingly that existing preprocessing methods were seldom modeled to directly enhance the separability of different faces, which should have been the essential goal. To address the issue, we propose to explicitly exploit maximizing separability of different subjects’ faces as the preprocessing objective. With this in mind, a novel approach, named by us Separability Oriented Preprocessing (SOP), is proposed to enhance face images by maximizing the Fisher separability criterion in scale-space. Extensive experiments on both laboratory-controlled and real-world face databases using different recognition methods show the effectiveness of the proposed approach.

Hu Han, Shiguang Shan, Xilin Chen, Shihong Lao, Wen Gao

Saliency Modeling from Image Histograms

We proposed a computational visual saliency modeling technique. The proposed technique makes use of a color co-occurrence histogram (CCH) that captures not only “how many” but also “where and how” image pixels are composed into a visually perceivable image. Hence the CCH encodes image saliency information that is usually perceived as the discontinuity between an image region or object and its surrounding. The proposed technique has a number of distinctive characteristics: It is fast, discriminative, tolerant to image scale variation, and involves minimal parameter tuning. Experiments over benchmarking datasets show that it predicts fixational eye tracking points accurately and a superior AUC of 71.25 is obtained.

Shijian Lu, Joo-Hwee Lim

A Theoretical Analysis of Camera Response Functions in Image Deblurring

Motion deblurring is a long standing problem in computer vision and image processing. In most previous approaches, the blurred image is modeled as the convolution of a latent intensity image with a blur kernel. However, for images captured by a real camera, the blur convolution should be applied to scene irradiance instead of image intensity and the blurred results need to be mapped back to image intensity via the camera’s response function (CRF). In this paper, we present a comprehensive study to analyze the effects of CRFs on motion deblurring. We prove that the intensity-based model closely approximates the irradiance model at low frequency regions. However, at high frequency regions such as edges, the intensity-based approximation introduces large errors and directly applying deconvolution on the intensity image will produce strong ringing artifacts even if the blur kernel is invertible. Based on the approximation error analysis, we further develop a dual-image based solution that captures a pair of sharp/blurred images for both CRF estimation and motion deblurring. Experiments on synthetic and real images validate our theories and demonstrate the robustness and accuracy of our approach.

Xiaogang Chen, Feng Li, Jie Yang, Jingyi Yu

Robust and Efficient Subspace Segmentation via Least Squares Regression

This paper studies the subspace segmentation problem which aims to segment data drawn from a union of multiple linear subspaces. Recent works by using sparse representation, low rank representation and their extensions attract much attention. If the subspaces from which the data drawn are independent or orthogonal, they are able to obtain a block diagonal affinity matrix, which usually leads to a correct segmentation. The main differences among them are their objective functions. We theoretically show that if the objective function satisfies some conditions, and the data are sufficiently drawn from independent subspaces, the obtained affinity matrix is always block diagonal. Furthermore, the data sampling can be insufficient if the subspaces are orthogonal. Some existing methods are all special cases. Then we present the Least Squares Regression (LSR) method for subspace segmentation. It takes advantage of data correlation, which is common in real data. LSR encourages a grouping effect which tends to group highly correlated data together. Experimental results on the Hopkins 155 database and Extended Yale Database B show that our method significantly outperforms state-of-the-art methods. Beyond segmentation accuracy, all experiments demonstrate that LSR is much more efficient.

Can-Yi Lu, Hai Min, Zhong-Qiu Zhao, Lin Zhu, De-Shuang Huang, Shuicheng Yan

Local Label Descriptor for Example Based Semantic Image Labeling

In this paper we introduce the concept of

local label descriptor

, which is a concatenation of label histograms for each cell in a patch. Local label descriptors alleviate the label patch misalignment issue in combining structured label predictions for semantic image labeling. Given an input image, we solve for a label map whose local label descriptors can be approximated as a sparse convex combination of exemplar label descriptors in the training data, where the sparsity is regularized by the similarity measure between the local feature descriptor of the input image and that of the exemplars in the training data set. Low-level image over-segmentation can be incorporated into our formulation to improve efficiency. Our formulation and algorithm compare favorably with the baseline method on the CamVid and Barcelona datasets.

Yiqing Yang, Zhouyuan Li, Li Zhang, Christopher Murphy, Jim Ver Hoeve, Hongrui Jiang

Road Scene Segmentation from a Single Image

Road scene segmentation is important in computer vision for different applications such as autonomous driving and pedestrian detection. Recovering the 3D structure of road scenes provides relevant contextual information to improve their understanding.

In this paper, we use a convolutional neural network based algorithm to learn features from noisy labels to recover the 3D scene layout of a road image. The novelty of the algorithm relies on generating training labels by applying an algorithm trained on a general image dataset to classify on–board images. Further, we propose a novel texture descriptor based on a learned color plane fusion to obtain maximal uniformity in road areas. Finally, acquired (off–line) and current (on–line) information are combined to detect road areas in single images.

From quantitative and qualitative experiments, conducted on publicly available datasets, it is concluded that convolutional neural networks are suitable for learning 3D scene layout from noisy labels and provides a relative improvement of 7% compared to the baseline. Furthermore, combining color planes provides a statistical description of road areas that exhibits maximal uniformity and provides a relative improvement of 8% compared to the baseline. Finally, the improvement is even bigger when acquired and current information from a single image are combined.

Jose M. Alvarez, Theo Gevers, Yann LeCun, Antonio M. Lopez

Efficient Recursive Algorithms for Computing the Mean Diffusion Tensor and Applications to DTI Segmentation

Computation of the mean of a collection of symmetric positive definite (SPD) matrices is a fundamental ingredient of many algorithms in diffusion tensor image (DTI) processing. For instance, in DTI segmentation, clustering, etc. In this paper, we present novel recursive algorithms for computing the mean of a set of diffusion tensors using several distance/divergence measures commonly used in DTI segmentation and clustering such as the Riemannian distance and symmetrized Kullback-Leibler divergence. To the best of our knowledge, to date, there are no recursive algorithms for computing the mean using these measures in literature. Recursive algorithms lead to a gain in computation time of several orders in magnitude over existing non-recursive algorithms. The key contributions of this paper are: (i) we present novel theoretical results on a recursive estimator for Karcher expectation in the space of SPD matrices, which in effect is a proof of the law of large numbers (with some restrictions) for the manifold of SPD matrices. (ii) We also present a recursive version of the symmetrized KL-divergence for computing the mean of a collection of SPD matrices. (iii) We present comparative timing results for computing the mean of a group of SPD matrices (diffusion tensors) depicting the gains in compute time using the proposed recursive algorithms over existing non-recursive counter parts. Finally, we also show results on gains in compute times obtained by applying these recursive algorithms to the task of DTI segmentation.

Guang Cheng, Hesamoddin Salehian, Baba C. Vemuri

Semi-Nonnegative Matrix Factorization for Motion Segmentation with Missing Data

Motion segmentation is an old problem that is receiving renewed interest because of its role in video analysis. In this paper, we present a Semi-Nonnegative Matrix Factorization (SNMF)method that models dense point tracks in terms of their optical flow, and decomposes sets of point tracks into semantically meaningful motion components. We show that this formulation of SNMF with missing values outperforms the state-of-the-art algorithm of Brox and Malik in terms of accuracy on 10-frame video segments from the Berkeley test set, while being over 100 times faster. We then show how SNMF can be applied to longer videos using sliding windows. The result is competitive in terms of accuracy with Brox and Malik’s algorithm, while still being two orders of magnitude faster.

Quanyi Mo, Bruce A. Draper

Oral Session 8: Semantic segmentation

A Three-Layered Approach to Facade Parsing

We propose a novel three-layered approach for semantic segmentation of building facades. In the first layer, starting from an oversegmentation of a facade, we employ the recently introduced machine learning technique Recursive Neural Networks (RNN) to obtain a probabilistic interpretation of each segment. In the second layer, initial labeling is augmented with the information coming from specialized facade component detectors. The information is merged using a Markov Random Field. In the third layer, we introduce

weak architectural knowledge

, which enforces the final reconstruction to be architecturally plausible and consistent. Rigorous tests performed on two existing datasets of building facades demonstrate that we significantly outperform the current-state of the art, even when using outputs from earlier layers of the pipeline. Also, we show how the final output of the third layer can be used to create a procedural reconstruction.

Anđelo Martinović, Markus Mathias, Julien Weissenberg, Luc Van Gool

Semantic Segmentation with Second-Order Pooling

Feature extraction, coding and pooling, are important components on many contemporary object recognition paradigms. In this paper we explore novel pooling techniques that encode the second-order statistics of local descriptors inside a region. To achieve this effect, we introduce multiplicative second-order analogues of average and max-pooling that together with appropriate non-linearities lead to state-of-the-art performance on free-form region recognition, without any type of feature coding. Instead of coding, we found that enriching local descriptors with additional image information leads to large performance gains, especially in conjunction with the proposed pooling methodology. We show that second-order pooling over free-form regions produces results superior to those of the winning systems in the Pascal VOC 2011 semantic segmentation challenge, with models that are 20,000 times faster.

João Carreira, Rui Caseiro, Jorge Batista, Cristian Sminchisescu

Shape Sharing for Object Segmentation

We introduce a category-independent shape prior for object segmentation. Existing shape priors assume class-specific knowledge, and thus are restricted to cases where the object class is known in advance. The main insight of our approach is that shapes are often shared between objects of

different

categories. To exploit this “shape sharing” phenomenon, we develop a non-parametric prior that transfers object shapes from an exemplar database to a test image based on local shape matching. The transferred shape priors are then enforced in a graph-cut formulation to produce a pool of object segment hypotheses. Unlike previous multiple segmentation methods, our approach benefits from global shape cues; unlike previous top-down methods, it assumes no class-specific training and thus enhances segmentation even for unfamiliar categories. On the challenging PASCAL 2010 and Berkeley Segmentation datasets, we show it outperforms the state-of-the-art in bottom-up or category-independent segmentation.

Jaechul Kim, Kristen Grauman

Segmentation Propagation in ImageNet

ImageNet is a large-scale hierarchical database of object classes. We propose to automatically populate it with pixelwise segmentations, by leveraging existing manual annotations in the form of class labels and bounding-boxes. The key idea is to recursively exploit images segmented so far to guide the segmentation of new images. At each stage this propagation process expands into the images which are easiest to segment at that point in time, e.g. by moving to the semantically most related classes to those segmented so far. The propagation of segmentation occurs both (a) at the image level, by transferring existing segmentations to estimate the probability of a pixel to be foreground, and (b) at the class level, by jointly segmenting images of the same class and by importing the appearance models of classes that are already segmented. Through an experiment on 577 classes and 500k images we show that our technique (i) annotates a wide range of classes with accurate segmentations; (ii) effectively exploits the hierarchical structure of ImageNet; (iii) scales efficiently; (iv) outperforms a baseline GrabCut [1] initialized on the image center, as well as our recent segmentation transfer technique [2] on which this paper is based. Moreover, our method also delivers state-of-the-art results on the recent iCoseg dataset for co-segmentation.

Daniel Kuettel, Matthieu Guillaumin, Vittorio Ferrari

“Clustering by Composition” – Unsupervised Discovery of Image Categories

We define a “good image cluster” as one in which images can be easily composed (like a puzzle) using pieces from each other, while are difficult to compose from images outside the cluster. The larger and more statistically significant the pieces are, the stronger the affinity between the images. This gives rise to unsupervised discovery of very challenging image categories. We further show how multiple images can be composed from each other simultaneously and efficiently using a collaborative randomized search algorithm. This collaborative process exploits the “wisdom of crowds of images”, to obtain a sparse yet meaningful set of image affinities, and in time which is almost linear in the size of the image collection. “Clustering-by-Composition” can be applied to very few images (where a ‘cluster model’ cannot be ‘learned’), as well as on benchmark evaluation datasets, and yields state-of-the-art results.

Alon Faktor, Michal Irani

Backmatter

Titel: Computer Vision – ECCV 2012
herausgegeben von: Andrew Fitzgibbon
Svetlana Lazebnik
Pietro Perona
Yoichi Sato
Cordelia Schmid
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-33786-4
Print ISBN: 978-3-642-33785-7
DOI: https://doi.org/10.1007/978-3-642-33786-4