Top

2012 | Book

Read chapter Read first chapter

Pattern Recognition

Joint 34th DAGM and 36th OAGM Symposium, Graz, Austria, August 28-31, 2012. Proceedings

Editors: Axel Pinz, Thomas Pock, Horst Bischof, Franz Leberl

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This book constitutes the refereed proceedings of the 34th Symposium of the German Association for Pattern Recognition, DAGM 2012, and the 36th Symposium of the Austrian Association for Pattern Recognition, OAGM 2012, held in Graz, Austria, in August 2012.

The 27 revised full papers and 23 revised poster papers were carefully reviewed and selected from 98 submissions. The papers are organized in topical sections on segmentation, low-level vision, 3D reconstruction, recognition, applications, learning, and features.

Frontmatter

Segmentation

As Time Goes by—Anytime Semantic Segmentation with Iterative Context Forests

We present a new approach for contextual semantic segmentation and introduce a new tree-based framework, which combines local information and context knowledge in a single model. The method itself is also suitable for anytime classification scenarios, where the challenge is to estimate a label for each pixel in an image while allowing an interruption of the estimation at any time. This offers the application of the introduced method in time-critical tasks, like automotive applications, with limited computational resources unknown in advance. Label estimation is done in an iterative manner and includes spatial context right from the beginning. Our approach is evaluated in extensive experiments showing its state-of-the-art performance on challenging street scene datasets with anytime classification abilities.

Björn Fröhlich, Erik Rodner, Joachim Denzler

Interactive Labeling of Image Segmentation Hierarchies

We study the task of interactive semantic labeling of a segmentation hierarchy. To this end we propose a framework interleaving two components: an automatic labeling step, based on a Conditional Random Field whose dependencies are defined by the inclusion tree of the segmentation hierarchy, and an interaction step that integrates incremental input from a human user. Evaluated on two distinct datasets, the proposed interactive approach efficiently integrates human interventions and illustrates the advantages of structured prediction in an interactive framework.

Georg Zankl, Yll Haxhimusa, Adrian Ion

Hierarchy of Localized Random Forests for Video Annotation

We address the problem of annotating a video sequence with partial supervision. Given the pixel-wise annotations in the first frame, we aim to propagate these labels ideally throughout the whole video. While some labels can be propagated using optical flow, disocclusion and unreliable flow in some areas require additional cues. To this end, we propose to train localized classifiers on the annotated frame. In contrast to a global classifier, localized classifiers allow to distinguish colors that appear in both the foreground and the background but at very different locations. We design a multi-scale hierarchy of localized random forests, which collectively takes a decision. Cues from optical flow and the classifier are combined in a variational framework. The approach can deal with multiple objects in a video. We present qualitative and quantitative results on the Berkeley Motion Segmentation Dataset.

Naveen Shankar Nagaraja, Peter Ochs, Kun Liu, Thomas Brox

Low-Level Vision

A TV-L1 Optical Flow Method with Occlusion Detection

In this paper we propose a variational model for joint optical flow and occlusion estimation. Our work stems from the optical flow method based on a TV-

approach and incorporates information that allows to detect occlusions. This information is based on the divergence of the flow and the proposed energy favors the location of occlusions on regions where this divergence is negative. Assuming that occluded pixels are visible in the previous frame, the optical flow on non-occluded pixels is forward estimated whereas is backwards estimated on the occluded ones. We display some experiments showing that the proposed model is able to properly estimate both the optical flow and the occluded regions.

Coloma Ballester, Lluis Garrido, Vanel Lazcano, Vicent Caselles

Curvature Prior for MRF-Based Segmentation and Shape Inpainting

Most image labeling problems such as segmentation and image reconstruction are fundamentally ill-posed and suffer from ambiguities and noise. Higher-order image priors encode high-level structural dependencies between pixels and are key to overcoming these problems. However, in general these priors lead to computationally intractable models. This paper addresses the problem of discovering compact representations of higher-order priors which allow efficient inference. We propose a framework for solving this problem that uses a recently proposed representation of higher-order functions which are encoded as lower envelopes of linear functions. Maximum a Posterior inference on our learned models reduces to minimizing a pairwise function of discrete variables. We show that our framework can learn a compact representation that approximates a low curvature shape prior and demonstrate its effectiveness in solving shape inpainting and image segmentation problems.

Alexander Shekhovtsov, Pushmeet Kohli, Carsten Rother

Mean Field for Continuous High-Order MRFs

Probabilistic inference beyond MAP estimation is of interest in computer vision, both for learning appropriate models and in applications. Yet, common approximate inference techniques, such as belief propagation, have largely been limited to discrete-valued Markov random fields (MRFs) and models with small cliques. Oftentimes, neither is desirable from an application standpoint. This paper studies mean field inference for continuous-valued MRF models with high-order cliques. Mean field can be applied effectively to such models by exploiting that the factors of certain classes of MRFs can be formulated using Gaussian mixtures, which allows retaining the mixture indicator as a latent variable. We use an image restoration setting to show that resulting mean field updates have a computational complexity quadratic in the clique size, which makes them scale even to large cliques. We contribute an empirical study with four applications: Image denoising, non-blind deblurring, noise estimation, and layer separation from a single image. We find mean field to yield a favorable combination of performance and efficiency, e.g. outperforming MAP estimation in denoising while being competitive with expensive sampling approaches. Novel approaches to noise estimation and layer separation demonstrate the breadth of applicability.

Kevin Schelten, Stefan Roth

How Well Do Filter-Based MRFs Model Natural Images?

Markov random fields (MRFs) have found widespread use as models of natural image and scene statistics. Despite progress in modeling image properties beyond gradient statistics with high-order cliques, and learning image models from example data, existing MRFs only exhibit a limited ability of actually capturing natural image statistics. In this paper we investigate this limitation of previous filter-based MRF models, which appears in contradiction to their maximum entropy interpretation. We argue that this is due to inadequacies in the leaning procedure and suggest various modifications to address them. We demonstrate that the proposed learning scheme allows training more suitable potential functions, whose shape approaches that of a Dirac-delta function, as well as models with larger and more filters. Our experiments not only indicate a substantial improvement of the models’ ability to capture relevant statistical properties of natural images, but also demonstrate a significant performance increase in a denoising application to levels previously unattained by generative approaches.

Qi Gao, Stefan Roth

3D Reconstruction

Anisotropic Range Image Integration

Obtaining high-quality 3D models of real world objects is an important task in computer vision. A very promising approach to achieve this is given by variational range image integration methods: They are able to deal with a substantial amount of noise and outliers, while regularising and thus creating smooth surfaces at the same time. Our paper extends the state-of-the-art approach of Zach et al.(2007) in several ways: (i) We replace the isotropic space-variant smoothing behaviour by an anisotropic (direction-dependent) one. Due to the directional adaptation, a better control of the smoothing with respect to the local structure of the signed distance field can be achieved. (ii) In order to keep data and smoothness term in balance, a normalisation factor is introduced. As a result, oversmoothing of locations that are seen seldom is prevented. This allows high quality reconstructions in uncontrolled capture setups, where the camera positions are unevenly distributed around an object. (iii) Finally, we use the more accurate closest signed distances instead of directional signed distances when converting range images into 3D signed distance fields. Experiments demonstrate that each of our three contributions leads to clearly visible improvements in the reconstruction quality.

Christopher Schroers, Henning Zimmer, Levi Valgaerts, Andrés Bruhn, Oliver Demetz, Joachim Weickert

Modeling of Sparsely Sampled Tubular Surfaces Using Coupled Curves

We present a variational approach to simultaneously trace the axis and determine the thickness of 3-D (or 2-D) tubular structures defined by sparsely and unevenly sampled noisy surface points. Many existing approaches try to solve the axis-tracing and the precise fitting in two subsequent steps. In contrast to this our model is initialized with a small cylinder segment and converges to the final tubular structure in a single energy minimization using a gradient descent scheme. The energy is based on the error of fit and simultaneously penalizes strong curvature and thickness variations. We demonstrate the performance of this closed formulation on volumetric microscopic data sets of the

Arabidopsis

root tip, where only the nuclei of the cells are visible.

Thorsten Schmidt, Margret Keuper, Taras Pasternak, Klaus Palme, Olaf Ronneberger

Shape (Self-)Similarity and Dissimilarity Rating for Segmentation and Matching

Similarities and dissimilarities can be found in many natural as well as man-made structures and are an important source of information, e.g., for isolating defects or pathological regions, and for finding unique points and regions of interest on surfaces. This paper introduces a new approach for computing similarity information that can be used, e.g., for surface segmentation or to guide a subsequent registration. The method is based on a probabilistic matching algorithm generating possible partial matches between shapes. For each point of a source surface we analyse the distribution of similar regions on a reference surface. In this way, we obtain a point-wise similarity rating between the source and reference shape. In our experimental evaluation we demonstrate the usability and show some excellent results on several 3D objects, like industrial CAD data sets, bone fractures, and potteries.

Simon Winkelbach, Jens Spehr, Dirk Buchholz, Markus Rilk, Friedrich M. Wahl

Dense 3D Reconstruction with a Hand-Held Camera

In this paper we present a method for dense 3D reconstruction from videos where object silhouettes are hard to retrieve. We introduce a close coupling between sparse bundle adjustment and dense multi-view reconstruction, which includes surface constraints by the sparse point cloud and an implicit loop closing via the dense surface. The surface is computed in a volumetric framework and guarantees a dense surface without holes. We demonstrate the flexibility of the approach on indoor and outdoor scenes recorded with a commodity hand-held camera.

Benjamin Ummenhofer, Thomas Brox

Recognition

OUR-CVFH – Oriented, Unique and Repeatable Clustered Viewpoint Feature Histogram for Object Recognition and 6DOF Pose Estimation

We propose a novel method to estimate a unique and repeatable reference frame in the context of 3D object recognition from a single viewpoint based on global descriptors. We show that the ability of defining a robust reference frame on both model and scene views allows creating descriptive global representations of the object view, with the beneficial effect of enhancing the spatial descriptiveness of the feature and its ability to recognize objects by means of a simple nearest neighbor classifier computed on the descriptor space. Moreover, the definition of repeatable directions can be deployed to efficiently retrieve the 6DOF pose of the objects in a scene. We experimentally demonstrate the effectiveness of the proposed method on a dataset including 23 scenes acquired with the Microsoft Kinect sensor and 25 full-3D models by comparing the proposed approach with state-of-the-art global descriptors. A substantial improvement is presented regarding accuracy in recognition and 6DOF pose estimation, as well as in terms of computational performance.

Aitor Aldoma, Federico Tombari, Radu Bogdan Rusu, Markus Vincze

3D Object Recognition and Pose Estimation for Multiple Objects Using Multi-Prioritized RANSAC and Model Updating

We present a feature-based framework that combines spatial feature clustering, guided sampling for pose generation, and model updating for 3D object recognition and pose estimation. Existing methods fails in case of repeated patterns or multiple instances of the same object, as they rely only on feature discriminability for matching and on the estimator capabilities for outlier rejection. We propose to spatially separate the features before matching to create smaller clusters containing the object. Then, hypothesis generation is guided by exploiting cues collected off- and on-line, such as feature repeatability, 3D geometric constraints, and feature occurrence frequency. Finally, while previous methods overload the model with synthetic features for wide baseline matching, we claim that continuously updating the model representation is a lighter yet reliable strategy. The evaluation of our algorithm on challenging video sequences shows the improvement provided by our contribution.

Michele Fenzi, Ralf Dragon, Laura Leal-Taixé, Bodo Rosenhahn, Jörn Ostermann

Classification with Global, Local and Shared Features

We present a framework that jointly learns and then uses multiple image windows for improved classification. Apart from using the entire image content as context, class-specific windows are added, as well as windows that target class pairs. The location and extent of the windows are set automatically by handling the window parameters as latent variables. This framework makes the following contributions: a) the addition of localized information through the class-specific windows improves classification, b) windows introduced for the classification of class pairs further improve the results, c) the windows and classification parameters can be effectively learnt using a discriminative max-margin approach with latent variables, and d) the same framework is suited for multiple visual tasks such as classifying objects, scenes and actions. Experiments demonstrate the aforementioned claims.

Hakan Bilen, Vinay P. Namboodiri, Luc J. Van Gool

Object Detection in Multi-view X-Ray Images

Motivated by aiding human operators in the detection of dangerous objects in passenger luggage, such as in airports, we develop an automatic object detection approach for multi-view X-ray image data. We make three main contributions: First, we systematically analyze the appearance variations of objects in X-ray images from inspection systems. We then address these variations by adapting standard appearance-based object detection approaches to the specifics of dual-energy X-ray data and the inspection scenario itself. To that end we reduce projection distortions, extend the feature representation, and address both in-plane and out-of-plane object rotations, which are a key challenge compared to many detection tasks in photographic images. Finally, we propose a novel multi-view (multi-camera) detection approach that combines single-view detections from multiple views and takes advantage of the mutual reinforcement of geometrically consistent hypotheses. While our multi-view approach can be used atop arbitrary single-view detectors, thus also for multi-camera detection in photographic images, we evaluate our method on detecting handguns in carry-on luggage. Our results show significant performance gains from all components.

Thorsten Franzel, Uwe Schmidt, Stefan Roth

Applications

Eye Localization Using the Discriminative Generalized Hough Transform

The Discriminative Generalized Hough Transform (DGHT) has been successfully introduced as a general method for the localization of arbitrary objects with well-defined shape in medical images. In this contribution, the framework is, for the first time, applied to the localization of eyes in a public face database. Based on a set of training images with annotated target points, the training procedure combines the Hough space votes of individual shape model points into a probability distribution of the maximum-entropy family and optimizes the free parameters of this distribution with respect to the training error rate. This assigns individual positive and negative weights to the shape model points, reflecting important structures of the target object and confusable shapes, respectively. Additionally, the estimated weights allow to determine irrelevant parts in order to eliminate them from the model, making space for the incorporation of new model point candidates. These candidates are in turn identified from training images with remaining high localization error. The whole procedure of weight estimation, point elimination, testing on training images and incorporation of new model point hypotheses is iterated several times until a stopping criterion is met. The method is further enhanced by applying a multi-level approach, in which the searched region is reduced in 6 zooming steps, using individually trained shape models on each level. An evaluation on the PUT face database has shown that the system achieves a state-of-the-art success rate of 99% for iris detection in frontal-view images and 95% if the test set contains the full head pose variability.

Ferdinand Hahmann, Heike Ruppertshofen, Gordon Böer, Ralf Stannarius, Hauke Schramm

Simultaneous Estimation of Material Properties and Pose for Deformable Objects from Depth and Color Images

In this paper we consider the problem of estimating 6D pose, material properties and deformation of an object grasped by a robot gripper. To estimate the parameters we minimize an error function incorporating visual and physical correctness. Through simulated and real-world experiments we demonstrate that we are able to find realistic 6D poses and elasticity parameters like Young’s modulus. This makes it possible to perform subsequent manipulation tasks, where accurate modelling of the elastic behaviour is important.

Andreas Rune Fugl, Andreas Jordt, Henrik Gordon Petersen, Morten Willatzen, Reinhard Koch

Surface Quality Inspection of Deformable Parts with Variable B-Spline Surfaces

High precision range sensors can be used for measuring 3D point clouds of object surfaces for quality inspection in industrial production. It is often difficult to formally describe acceptable tolerance ranges of real surfaces, especially for deformable objects. Instead of a formal definition, the surface and its tolerance range can rather be given by a set of training samples.

In this paper we describe how to apply the Karhunen-Loève-Transform (KLT) on B-spline surfaces. With this transform, a group of similar surfaces can be described with very few characteristic coefficients in the transformed domain, thus allowing the detection of marginal surface deviations on deformable parts.

Sebastian von Enzberg, Bernd Michaelis

Automated Image Forgery Detection through Classification of JPEG Ghosts

We present a method for automating the detection of the so-called JPEG ghost s. JPEG ghost s can be used for discriminating single- and double JPEG compression, which is a common cue for image manipulation detection. The JPEG ghost scheme is particularly well-suited for non-technical experts, but the manual search for such ghost s can be both tedious and error-prone. In this paper, we propose a method that automatically and efficiently discriminates single- and double-compressed regions based on the JPEG ghost principle. Experiments show that the detection results are highly competitive with state-of-the-art methods, for both, aligned and shifted JPEG grids in double-JPEG compression.

Fabian Zach, Christian Riess, Elli Angelopoulou

Learning

Synergy-Based Learning of Facial Identity

In this paper we address the problem that most face recognition approaches neglect that faces share strong visual similarities, which can be exploited when learning discriminative models. Hence, we propose to model face recognition as multi-task learning problem. This enables us to exploit both, shared common information and also individual characteristics of faces. In particular, we build on Mahalanobis metric learning, which has recently shown good performance for many computer vision problems. Our main contribution is twofold. First, we extend a recent efficient metric learning algorithm to multi-task learning. The resulting algorithm supports label-incompatible learning which allows us to tap the rather large pool of anonymously labeled face pairs also for face identification. Second, we show how to learn and combine person specific metrics for face identification improving the classification power. We demonstrate the method for different face recognition tasks where we are able to match or slightly outperform state-of-the-art multi-task learning approaches.

Martin Köstinger, Peter M. Roth, Horst Bischof

Information Theoretic Clustering Using Minimum Spanning Trees

In this work we propose a new information-theoretic clustering algorithm that infers cluster memberships by direct optimization of a non-parametric mutual information estimate between data distribution and cluster assignment. Although the optimization objective has a solid theoretical foundation it is hard to optimize. We propose an approximate optimization formulation that leads to an efficient algorithm with low runtime complexity. The algorithm has a single free parameter, the number of clusters to find. We demonstrate superior performance on several synthetic and real datasets.

Andreas C. Müller, Sebastian Nowozin, Christoph H. Lampert

Dynamical SVM for Time Series Classification

We present a method for classifying multidimensional time series using concepts from nonlinear dynamical systems theory. Our contribution is an extension of support vector machines (SVM) that controls a nonlinear dynamical system. We use a chain of coupled Rössler oscillators with diffusive coupling to model highly nonlinear and chaotic time series. The optimization procedure involves alternating between using the sequential minimal optimization algorithm to solve the standard SVM dual problem and computing the solution of the ordinary differential equations defining the dynamical system. Empirical comparisons with kernel-based methods for time series classification on real data sets demonstrate the effectiveness of our approach.

Ramón Huerta, Shankar Vembu, Mehmet K. Muezzinoglu, Alexander Vergara

Trust-Region Algorithm for Nonnegative Matrix Factorization with Alpha- and Beta-divergences

Nonnegative Matrix Factorization (NMF) is a dimensionality reduction method for representing nonnegative data in a low-dimensional nonnegative space. NMF problems are usually solved with an alternating minimization of a given objective function, using nonnegativity constrained optimization algorithms. This paper is concerned with the projected trust-region algorithm that is adapted to minimize a family of divergences or statistical distances, such as

- or

-divergences that are efficient for solving NMF problems. Using the Cauchy point estimate for the quadratic approximation model, a radius of the trust-region can be estimated efficiently for a symmetric and block-diagonal structure of the corresponding Hessian matrices. The experiments demonstrate a high efficiency of the proposed approach.

Rafał Zdunek

Features

Line Matching Using Appearance Similarities and Geometric Constraints

Line matching for image pairs under various transformations is a challenging task. In this paper, we present a line matching algorithm which considers both the local appearance of lines and their geometric attributes. A relational graph is built for candidate matches and a spectral technique is employed to solve this matching problem efficiently. Extensive experiments on a dataset which includes various image transformations validate the matching performance and the efficiency of the proposed line matching algorithm.

Lilian Zhang, Reinhard Koch

Salient Pattern Detection Using W 2 on Multivariate Normal Distributions

Saliency is an attribute that is not included in an object itself, but arises from complex relations to the scene. Common belief in neuroscience is that objects are eye-catching if they exhibit an anomaly in some basic feature of human perception. This enables detection of object-like structures without prior knowledge. In this paper, we introduce an approach that models these object-to-scene relations based on probability theory. We rely on the conventional structure of cognitive visual attention systems, measuring saliency by local center to surround differences on several basic feature cues and multiple scales, but innovate how to model appearance and to quantify differences. Therefore, we propose an efficient procedure to compute ML-estimates for (multivariate) normal distributions of local feature statistics. Reducing feature statistics to Gaussians facilitates a closed-form solution for the

-distance (Wasserstein metric based on the Euclidean norm) between a center and a surround distribution. On a widely used benchmark for salient object detection, our approach, named CoDi-Saliency (for Continuous Distributions), outperformed nine state-of-the-art saliency detectors in terms of precision and recall.

Dominik Alexander Klein, Simone Frintrop

A Simple Extension of Stability Feature Selection

Stability selection

[9] is a general principle for performing feature selection. It functions as a meta-layer on top of a “baseline” feature selection method, and consists in repeatedly applying the baseline to random data subsamples of half-size, and finally outputting the features with selection frequency larger than a fixed threshold. In the present work, we suggest and study a simple extension of the original stability selection. It consists in applying the baseline method to random submatrices of the data matrix

of a given size and returning those features having the largest selection frequency. We analyze from a theoretical point of view the effect of this subsampling on the selected variables, in particular the influence of the data subsample size. We report experimental results on large-dimension artificial and real data and identify in which settings stability selection is to be recommended.

A. Beinrucker, Ü. Dogan, G. Blanchard

Feature-Based Multi-video Synchronization with Subframe Accuracy

We present a novel algorithm for temporally synchronizing multiple videos capturing the same dynamic scene. Our algorithm relies on general image features and it does not require explicitly tracking any specific object, making it applicable to general scenes with complex motion. This is facilitated by our new trajectory filtering and matching schemes that correctly identifies matching pairs of trajectories (inliers) from a large set of potential candidate matches, of which many are outliers. We find globally optimal synchronization parameters by using a stable RANSAC-based optimization approach. For multi-video synchronization, the algorithm identifies an informative subset of video pairs which prevents the RANSAC algorithm from being biased by outliers. Experiments on two-camera and multi-camera synchronization demonstrate the performance of our algorithm.

A. Elhayek, C. Stoll, K. I. Kim, H. -P. Seidel, C. Theobalt

Posters

Combination of Sinusoidal and Single Binary Pattern Projection for Fast 3D Surface Reconstruction

A new method for 3D surface reconstruction is introduced combining classical fringe projection technique and binary single pattern projection. The new technique allows keeping the high accuracy obtained by phase shifting but solves the additional necessary period identification by replacing the extensive Gray-code sequence by a single image of a certain binary pattern. The core of the new method is an algorithm which realizes the assignment of corresponding image regions using epipolar constraint and image correlation. An algorithm is introduced generating a single binary pattern which is optimized concerning image correlation. The results of first measurements show the high robustness of the new method and advantages of the optimized patterns compared to the use of conventional random patterns.

Christian Bräuer-Burchardt, Peter Kühmstedt, Gunther Notni

Consensus Multi-View Photometric Stereo

We propose a multi-view photometric stereo technique that uses photometric normal consistency to jointly estimate surface position and orientation. The underlying scene representation is based on oriented points, yielding more flexibility compared to smoothly varying surfaces. We demonstrate that the often employed least squares error of the Lambertian image formation model fails for wide-baseline settings without known visibility information. We then introduce a multi-view normal consistency approach and demonstrate its efficiency on synthetic and real data. In particular, our approach is able to handle occlusion, shadows, and other sources of outliers.

Mate Beljan, Jens Ackermann, Michael Goesele

Automatic Scale Selection of Superimposed Signals

This work introduces a novel method to estimate the characteristic scale of low-level image structures, which can be modeled as superpositions of intrinsically one-dimensional signals. Rather than being a single scalar quantity, the characteristic scale of the superimposed signal model is an affine equivariant regional feature. The estimation of the characteristic scale is based on an accurate estimation scheme for the orientations of the intrinsically one-dimensional signals. Using the orientation estimations, the characteristic scales of the single intrinsically one-dimensional signals are obtained. The single orientations and scales are combined into a single affine equivariant regional feature describing the characteristic scale of the superimposed signal model. Being based on convolutions with linear shift invariant filters and one-dimensional extremum searches it yields an efficient implementation.

Oliver Fleischmann, Gerald Sommer

Sensitivity/Robustness Flexible Ellipticity Measures

Ellipse is one of basic shapes used frequently for modeling in different domains. Fitting an ellipse to the certain data set is a well-studied problem. In addition the question how to measure the shape ellipticity has also been studied. The existing methods to estimate how much a given shape differs from a perfect ellipse are area based. Because of this, these methods are robust (e.g. with respect to noise or to image resolution applied). This is a desirable property when working with a low quality data, but there are also situations where methods sensitive to the presence of noise or to small object deformations, are more preferred. (e.g. in high precision inspection tasks.)

In this paper we propose a new family of ellipticity measure. The ellipticity measures are dependent on a single parameter and by varying this parameter the sensitivity/robustness properties of the related ellipticity measures, vary as well.

Independently on the parameter choice, all the new ellipticity measures are invariant with respect to the translation, scaling, and rotation transformation, they all range over (0; 1] and pick 1 if and only if the shape considered is an ellipse. New measures are theoretically well founded. Because of this their behavior in particular applications is well understood and can be predicted to some extent, which is always an advantage.

Several experiments are provided illustrate the behavior and performance of the new measures.

Mehmet Ali Aktaş, Joviša Žunić

Sparse Point Estimation for Bayesian Regression via Simulated Annealing

In the context of variable selection in a regression model, the classical Lasso based optimization approach provides a sparse estimate with respect to regression coefficients but is unable to provide more information regarding the distribution of regression coefficients. Alternatively, using a Bayesian approach is more advantageous since it gives direct access to the distribution which is usually summarized by estimating the expectation (not sparse) and variance. Additionally, to support frequent application requirements, heuristics like thresholding are generally used to produce sparse estimates for variable selection purposes. In this paper, we provide a more principled approach for generating a sparse point estimate in a Bayesian framework. We extend an existing Bayesian framework for sparse regression to generate a MAP estimate by using simulated annealing. We then justify this extension by showing that this MAP estimate is also sparse in the regression coefficients. Experiments on real world applications like the splice site detection and diabetes progression demonstrate the usefulness of the extension.

Sudhir Raman, Volker Roth

Active Metric Learning for Object Recognition

Popular visual representations like SIFT have shown broad applicability across many task. This great generality comes naturally with a lack of specificity when focusing on a particular task or a set of classes. Metric learning approaches have been proposed to tailor general purpose representations to the needs of more specific tasks and have shown strong improvements on visual matching and recognition benchmarks. However, the performance of metric learning depends strongly on the labels that are used for learning. Therefore, we propose to combine metric learning with an active sample selection strategy in order to find labels that are representative for each class as well as improve the class separation of the learnt metric. We analyze several active sample selection strategies in terms of exploration and exploitation trade-offs. Our novel scheme achieves on three different datasets up to 10% improvement of the learned metric. We compare a batch version of our scheme to an interleaved execution of sample selection and metric learning which leads to an overall improvement of up to 23% on challenging datasets for object class recognition.

Sandra Ebert, Mario Fritz, Bernt Schiele

Accuracy-Efficiency Evaluation of Adaptive Support Weight Techniques for Local Stereo Matching

Adaptive support weight (ASW) strategies in local stereo matching have recently attracted many researchers due to their compelling results. In this paper, we present an evaluation study that focuses on weight computation methods that have been suggested in the most recent literature. We implemented 9 ASW stereo methods and tested them on all (35) ground truth test stereo image pairs of the Middlebury benchmark. Our evaluation considers both the accuracy of the matching process and the computational efficiency of its GPU implementation. According to our results, high-quality matching results at real-time processing speeds can be achieved by using the guided image filter weights.

Asmaa Hosni, Margrit Gelautz, Michael Bleyer

Groupwise Shape Registration Based on Entropy Minimization

In this paper, we propose a unified framework for global-to-local groupwise shape registration based on an unbiased diffeomorphic shape atlas. We introduce the information-theoretic concept of

entropy

as the energy functional for shape registration. To this end, for given example shapes, we estimate the underlying shape distribution on the space of signed distance functions in a nonparametric way. We then perform global-to-local shape registration by minimizing the shape entropy estimate and entropy of displacement vector field. In addition, the gradient flow for the shape entropy is derived explicitly using the

$\mathcal{L}_2$

-distance in Hilbert space for a template shape estimation. Diffeomorphisms which are estimated by rigid/nonrigid registrations obviously establish dense correspondences between an example shape and the template shape. In addition, the composition rule gives a way to establish consistent correspondences by guaranteeing another diffeomorphism.

Youngwook Kee, Daniel Cremers, Junmo Kim

Adaptive Multi-cue 3D Tracking of Arbitrary Objects

We present a general method for RGB-D data that is able to track arbitrary objects in real-time in challenging real-world scenarios. The method is based on the Condensation algorithm. The observation model consists of a target/background classifier that is boosted from a pool of grayscale, color, and depth features. The training set of the observation model is updated with new examples from tracking and the classifier is re-trained to cope with the new appearances of the target. A mechanism maintains a small set of specialized candidate features in the pool, thus decreasing the computational time, while keeping the performance stable. Depth measurements are integrated into the prediction of the 3D state of the particles. We evaluate our approach with a new benchmark for RGB-D tracking algorithms; the results prove our method to be robust under real-world settings, being able to keep track of the targets over 96% of the time.

Germán Martín García, Dominik Alexander Klein, Jörg Stückler, Simone Frintrop, Armin B. Cremers

Training of Classifiers for Quality Control of On-Line Laser Brazing Processes with Highly Imbalanced Datasets

This paper investigates on the training of classifiers with highly imbalanced datasets for industrial quality control. The application is on-line process monitoring of laser brazing processes and only a limited amount of data of an imperfection class is available for training. Bayesian adaptation is used to derive a model of the imperfection class from a well sampled model of the class representing a high grade joint surface. For this application, we are able to show that with the sparse training data a performance comparable to a training with a balanced dataset is achievable and even a moderate increase of training data quickly yields a performance gain.

Daniel Fecker, Volker Märgner, Tim Fingscheidt

PCA-Enhanced Stochastic Optimization Methods

In this paper, we propose to enhance particle-based stochastic optimization methods (SO) by using Principal Component Analysis (PCA) to build an approximation of the cost function in a neighborhood of particles during optimization. Then we use it to shift the samples in the direction of maximum cost change. We provide theoretical basis and experimental results showing that such enhancement improves the performance of existing SO methods significantly. In particular, we demonstrate the usefulness of our method when combined with standard Random Sampling, Simulated Annealing and Particle Filter.

Alina Kuznetsova, Gerard Pons-Moll, Bodo Rosenhahn

A Real-Time MRF Based Approach for Binary Segmentation

We present an MRF based approach for binary segmentation that is able to work in real time. As we are interested in processing of live video streams, fully unsupervised learning schemes are necessary. Therefore, we use generative models. Unlike many existing methods that use Energy Minimization techniques, we employ max-marginal decision. It leads to sampling algorithms that can be implemented for the proposed model in a very efficient manner.

Dmitrij Schlesinger

Pottics – The Potts Topic Model for Semantic Image Segmentation

We present a novel conditional random field (CRF) for semantic segmentation that extends the common Potts model of spatial coherency with latent topics, which capture higher-order spatial relations of segment labels. Specifically, we show how recent approaches for producing sets of figure-ground segmentations can be leveraged to construct a suitable graph representation for this task. The CRF model incorporates such proposal segmentations as topics, modelling the joint occurrence or absence of object classes. The resulting model is trained using a structured large margin approach with latent variables. Experimental results on the challenging VOC’10 dataset demonstrate significant performance improvements over simpler models with less spatial structure.

Christoph Dann, Peter Gehler, Stefan Roth, Sebastian Nowozin

Decision Tree Ensembles in Biomedical Time-Series Classification

There are numerous classification methods developed in the field of machine learning. Some of these methods, such as artificial neural networks and support vector machines, are used extensively in biomedical time-series classification. Other methods have been used less often for no apparent reason. The aim of this work is to examine the applicability of decision tree ensembles as strong and practical classification algorithms in biomedical domain. We consider four common decision tree ensembles: AdaBoost.M1+C4.5, Multi- Boost+C4.5, random forest, and rotation forest. The decision tree ensembles are compared with SMO-based support vector machines classifiers (linear, squared polynomial, and radial kernel) on three distinct biomedical time-series datasets. For evaluation purposes, 10x10-fold cross-validation is used and the classifiers are measured in terms of sensitivity, specificity, and speed of model construction. The classifiers are compared in terms of statistically significant winslosses-ties on the three datasets. We show that the overall results favor decision tree ensembles over SMO-based support vector machines. Preliminary results suggest that AdaBoost.M1 and MultiBoost are the best of the examined classifiers, with no statistically significant difference between them. These results should encourage the use of decision tree ensembles in biomedical time-series datasets where optimal model accuracy is sought.

Alan Jović, Karla Brkić, Nikola Bogunović

Spatio-temporally Coherent Interactive Video Object Segmentation via Efficient Filtering

In this paper we propose a fast, interactive object segmentation and matting framework for videos that allows users to extract objects from a video using only a few foreground scribbles. Our approach is based on recent work [12] that obtains high-quality image segmentations by smoothing the likelihood of a color model with a fast edge-preserving filter. The previous approach was originally intended for single static images and does not achieve temporally coherent segmentations for videos. Our main contribution is to extend the approach of [12] to the temporal domain. Our results are spatially and temporally coherent segmentations, in which the borders of the foreground object are aligned with spatio-temporal color edges in the video. The obtained binary segmentation can be further refined in a temporally coherent and equally efficient alpha matting step. Quantitative and qualitative evaluations show that our extension significantly reduces flickering in the video segmentations.

Nicole Brosch, Asmaa Hosni, Christoph Rhemann, Margrit Gelautz

Discrepancy Norm as Fitness Function for Defect Detection on Regularly Textured Surfaces

This paper addresses the problem of quality inspection of regular textured surfaces as, e.g., encountered in industrial woven fabrics. The motivation for developing a novel approach is to utilize the template matching principle for defect detection in a way that does not need any particular statistical, structural or spectral features to be calculated during the checking phase. It is shown that in this context template matching becomes both feasible and effective by exploiting the so-called discrepancy measure as fitness function, leading to a defect detection method that shows advantages in terms of easy configuration and low maintenance efforts.

Gernot Stübl, Jean-Luc Bouchot, Peter Haslinger, Bernhard Moser

Video Compression with 3-D Pose Tracking, PDE-Based Image Coding, and Electrostatic Halftoning

Recent video compression algorithms such as the members of the MPEG or H.26x family use image transformations to store individual frames, and motion compensation between these frames. In contrast, the video codec presented here is a model-based approach that encodes fore- and background independently. It is well-suited for applications with static backgrounds, i.e. for applications such as traffic or security surveillance, or video conferencing. Our video compression algorithm tracks moving foreground objects and stores the obtained poses. Furthermore, a compressed version of the background image and some other information such as 3-D object models are encoded. In a second step, recent halftoning and PDE-based image compression algorithms are employed to compress the encoding error. Experiments show that the stored videos can have a significantly better quality than state-of-the-art algorithms such as MPEG-4.

Christian Schmaltz, Joachim Weickert

Image Completion Optimised for Realistic Simulations of Wound Development

Treatment costs for chronic wound healing disturbances have a strong impact on the health care system. In order to motivate patients and thus reduce treatment times there was the need to visualize possible wound developments based on the current situation of the affected body part. Known disease patterns were used to build a model for simulating the healing as well as the worsening process. The key point for the construction of possible wound stages was the creation of a nicely fitting texture including all representative tissue types. Since wounds are mostly circularly shaped, as first step of the healing an image completion based on radial texture synthesis of small patches from the healthy tissue surrounding the wound was developed. The radial information of the wound border was used to optimize the overlap between individual patches. In a similar way complete layers of all other appearing tissue types were constructed and superimposed using masks representing trained possible appearances. Results show that the developed texture synthesis together with the trained knowledge is perfectly suited to construct realistic wound images for different stages of the disease.

Michael Schneeberger, Martina Uray, Heinz Mayer

Automatic Model Selection in Archetype Analysis

Archetype analysis involves the identification of representative objects from amongst a set of multivariate data such that the data can be expressed as a convex combination of these representative objects. Existing methods for archetype analysis assume a fixed number of archetypes a priori. Multiple runs of these methods for different choices of archetypes are required for model selection. Not only is this computationally infeasible for larger datasets, in heavy-noise settings model selection becomes cumbersome. In this paper, we present a novel extension to these existing methods with the specific focus of relaxing the need to provide a fixed number of archetypes beforehand. Our fast iterative optimization algorithm is devised to automatically select the right model using BIC scores and can easily be scaled to noisy, large datasets. These benefits are achieved by introducing a Group-Lasso component popular for sparse linear regression. The usefulness of the approach is demonstrated through simulations and on a real world application of document analysis for identifying topics.

Sandhya Prabhakaran, Sudhir Raman, Julia E. Vogt, Volker Roth

Stereo Fusion from Multiple Viewpoints

Advanced driver assistance using cameras is a first important step towards autonomous driving tasks. However, the computational power in automobiles is highly limited and hardware platforms with enormous processing resources such as GPUs are not available in serial production vehicles. In our paper we address the need for a highly efficient fusion method that is well suited for standard CPUs.

We assume that a number of pairwise disparity maps are available, which we project to a reference view pair and fuse them efficiently to improve the accuracy of the reference disparity map. We estimate a probability density function of disparities in the reference image using projection uncertainties. In the end the most probable disparity map is selected from the probability distribution.

We carried out extensive quantitative evaluations on challenging stereo data sets and real world images. These results clearly show that our method is able to recover very accurate disparity maps in real-time.

Christian Unger, Eric Wahl, Peter Sturm, Slobodan Ilic

Confidence Measurements for Adaptive Bayes Decision Classifier Cascades and Their Application to US Speed Limit Detection

This article presents an adaptive Bayes model for the decision logic of cascade classifier structures. The proposed method is fast and robust with respect to multimodal and overlapping distributions and can be applied to arbitrary stage classifiers with continuous outputs. The method consists of an adaptive computation of thresholds and probability density functions which outperform the threshold based decision. It furthermore guarantees high detection rates independent of the number of stage classifiers. Based on this Bayes model different confidence measures are proposed and evaluated statistically and used for merging detection windows. The algorithm is applied to the detection of US speed limit signs under typical driving conditions. Results show that on a single CPU with 3.3 GHz the proposed method yields single image detection rates of 97 % with 0.2 false positives per image running at 13 Hz, and for a different setup a detection rate of 93 % with 0.2 false positives per image performing with 43 Hz for scanning the whole image (752x480 pixels).

Armin Staudenmaier, Ulrich Klauck, Ulrich Kreßel, Frank Lindner, Christian Wöhler

A Bottom-Up Approach for Learning Visual Object Detection Models from Unreliable Sources

The ability to learn models of computational vision from sample data has significantly advanced the field. Obtaining suitable training image sets, however, remains a challenging problem. In this paper we propose a bottom-up approach for learning object detection models from weakly annotated samples, i.e., only category labels are given per image. By combining visual saliency and distinctiveness of local image features regions of interest are extracted in a completely automatic way without requiring detailed annotations. Using a bag-of-features representation of these regions, object recognition models can be trained for the given object categories. As weakly labeled sample images can easily be obtained from image search engines, our approach does not require any manual annotation effort. Experiments on data from the Visual Object Classes Challenge 2011 show that promising object detection results can be achieved by our proposed method.

Fabian Nasse, Gernot A. Fink

Active Learning of Ensemble Classifiers for Gesture Recognition

In this study we consider the classification of emblematic gestures based on ensemble methods. In contrast to HMM-based approaches processing a gesture as a whole, we classify trajectory segments comprising a fixed number of sampling points. We propose a multi-view approach in order to increase the diversity of the classifiers across the ensemble by applying different methods for data normalisation and dimensionality reduction and by employing different classifier types. A genetic search algorithm is used to select the most successful ensemble configurations from the large variety of possible combinations. In addition to supervised learning, we make use of both labelled and unlabelled data in an active learning framework in order to reduce the effort required for manual labelling. In the supervised learning scenario, recognition rates per moment in time of more than 86% are obtained, which is comparable to the recognition rates obtained by a HMM approach for complete gestures. The active learning scenario yields recognition rates in excess of 80% even when only a fraction of 20% of all training samples are used.

J. Schumacher, D. Sakič, A. Grumpe, Gernot A. Fink, C. Wöhler

Backmatter

Title: Pattern Recognition
Editors: Axel Pinz
Thomas Pock
Horst Bischof
Franz Leberl
Publisher: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-32717-9
Print ISBN: 978-3-642-32716-2
DOI: https://doi.org/10.1007/978-3-642-32717-9

Springer Professional