nach oben

2013 | Buch

Kapitel lesen Erstes Kapitel lesen

Pattern Recognition

35th German Conference, GCPR 2013, Saarbrücken, Germany, September 3-6, 2013. Proceedings

herausgegeben von: Joachim Weickert, Matthias Hein, Bernt Schiele

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book constitutes the refereed proceedings of the 35th German Conference on Pattern Recognition, GCPR 2013, held in Saarbrücken, Germany, in September 2013. The 22 revised full papers and 18 revised poster papers were carefully reviewed and selected from 79 submissions. The papers covers topics such as image processing and computer vision, machine learning and pattern recognition, mathematical foundations, statistical data analysis and models, computational photography and confluence of vision and graphics, and applications in natural sciences, engineering, biomedical data analysis, imaging, and industry.

Inhaltsverzeichnis

Frontmatter

Stereo and Structure from Motion

Reconstructing Reflective and Transparent Surfaces from Epipolar Plane Images

While multi-view stereo reconstruction of Lambertian surfaces is nowadays highly robust, reconstruction methods based on correspondence search usually fail in the presence of ambiguous information, like in the case of partially reflecting and transparent surfaces. On the epipolar plane images of a 4D light field, however, surfaces like these give rise to overlaid patterns of oriented lines. We show that these can be identified and analyzed quickly and accurately with higher order structure tensors. The resulting method can reconstruct with high precision both the geometry of the surface as well as the geometry of the reflected or transmitted object. Accuracy and feasibility are shown on both ray-traced synthetic scenes and real-world data recorded by our gantry.

Sven Wanner, Bastian Goldluecke

Structure from Motion Using Rigidly Coupled Cameras without Overlapping Views

Structure from Motion can be improved by using multi-camera systems without overlapping views to provide a large combined field of view. The extrinsic calibration of such camera systems can be computed from local reconstructions using

hand-eye calibration

techniques. Nevertheless these approaches demand that motion constraints resulting from the rigid coupling of the cameras are satisfied which is in general not the case for decoupled pose estimation. This paper presents an extension to Structure from Motion using multiple rigidly coupled cameras that integrates rigid motion constraints already into the local pose estimation step, based on dual quaternions for pose representation. It is shown in experiments with synthetic and real data that the overall quality of the reconstruction process is improved and pose error accumulation is counteracted, leading to more accurate extrinsic calibration.

Sandro Esquivel, Reinhard Koch

Highly Accurate Depth Estimation for Objects at Large Distances

Precise stereo-based depth estimation at large distances is challenging: objects become very small, often exhibit low contrast in the image, and can hardly be separated from the background based on disparity due to measurement noise. In this paper we present an approach that overcomes these problems by combining robust object segmentation and highly accurate depth and motion estimation. The segmentation criterion is formulated as a probabilistic combination of disparity, optical flow and image intensity that is optimized using graph cuts. Segmentation and segment parameter models for the different cues are iteratively refined in an Expectation-Maximization scheme. Experiments on real-world traffic scenes demonstrate the accuracy of segmentation and disparity results for vehicles at distances of up to 180 meters. The proposed approach outperforms state-of-the-art stereo methods, achieving an average object disparity RMS error below 0.1 pixel, at typical object sizes of less than 15x15 pixels.

Peter Pinggera, Uwe Franke, Rudolf Mester

A Low-Rank Constraint for Parallel Stereo Cameras

Stereo-camera systems enjoy wide popularity since they provide more restrictive constraints for 3d-reconstruction. Given an image sequence taken by parallel stereo cameras, a low-rank constraint is derived on the measurement data. Correspondences between left and right images are not necessary yet reduce the number of optimization parameters. Conversely, traditional algorithms for stereo factorization require

all

feature points in both images to be matched, otherwise left and right image streams need be factorized independently. The performance of the proposed algorithm will be evaluated on synthetic data as well as two real image applications.

Christian Cordes, Hanno Ackermann, Bodo Rosenhahn

Poster Session

Multi-Resolution Range Data Fusion for Multi-View Stereo Reconstruction

In this paper we present a probabilistic algorithm for multi-view reconstruction from calibrated images. The algorithm is based on multi-resolution volumetric range image integration and is highly separable as it only employs local optimization. Dense depth maps are transformed in an octree data structure with variable voxel sizes. This allows for an efficient modeling of point clouds with very variable density. A probability function constructed in discrete space is built locally with a Bayesian approach. Compared to other algorithms we can deal with extremely big scenes and complex camera configurations in a limited amount of time, as the solution can be split in arbitrarily many parts and computed in parallel. The algorithm has been applied to lab and outdoor benchmark data as well as to large image sets of urban regions taken by cameras on Unmanned Aerial Vehicles (UAVs) and from the ground, demonstrating high surface quality and good runtime performance.

Andreas Kuhn, Heiko Hirschmüller, Helmut Mayer

3D Object Class Geometry Modeling with Spatial Latent Dirichlet Markov Random Fields

This paper presents a novel part-based geometry model for 3D object classes based on latent Dirichlet allocation (LDA). With all object instances of the same category aligned to a canonical pose, the bounding box is discretized to form a 3D space dictionary for LDA. To enhance the spatial coherence of each part during model learning, we extend LDA by strategically constructing a Markov random field (MRF) on the part labels, and adding an extra spatial parameter for each part. We refer to the improved model as spatial latent Dirichlet Markov random fields (SLDMRF). The experimental results demonstrate that SLDMRF exhibits superior semantic interpretation and discriminative ability in model classification to LDA and other related models.

Hanchen Xiong, Sandor Szedmak, Justus Piater

Discriminative Joint Non-negative Matrix Factorization for Human Action Classification

This paper describes a supervised classification approach based on non-negative matrix factorization (NMF). Our classification framework builds on the recent expansions of non-negative matrix factorization to multiview learning, where the primary dataset benefits from auxiliary information for obtaining shared and meaningful spaces. For discrimination, we utilize data categories in a supervised manner as an auxiliary source of information in order to learn co-occurrences through a common set of basis vectors. We demonstrate the efficiency of our algorithm in integrating various image modalities for enhancing the overall classification accuracy over different benchmark datasets. Our evaluation considers two challenging image datasets of human action recognition. We show that our algorithm achieves superior results over state-of-the-art in terms of efficiency and overall classification accuracy.

Abdalrahman Eweiwi, Muhammad Shahzad Cheema, Christian Bauckhage

Joint Shape Classification and Labeling of 3-D Objects Using the Energy Minimization Framework

We propose a combination of multiple Conditional Random Field (CRF) models with a linear classifier. The model is used for the semantic labeling of 3-D surface meshes with large variability in shape. The model employs multiple CRFs of low complexity for surface labeling each of which models the distribution of labelings for a group of surfaces with a similar shape. Given a test surface the classifier exploits the MAP energies of the inferred CRF labelings to determine the shape class. We discuss the associated recognition and learning tasks and demonstrate the capability of the joint shape classification and labeling model on the object category of human outer ears.

Alexander Zouhar, Dmitrij Schlesinger, Siegfried Fuchs

A Coded 3d Calibration Method for Line-Scan Cameras

This paper presents a novel 3d calibration method for line-scan cameras using coded straight line patterns. We describe an algorithm to calculate the 3d-points of intersection between the straight lines of the pattern and the viewing plane of the camera. By a simple encoding we can identify the intersections of multiple patterns unambiguously. For the actual 3d calibration we reduce the dimensionality of the camera model and solve the calibration problem within the viewing plane. After that 2d calibration we transfer the camera model back into 3d. Some real test results are shown to confirm the functionality of the proposed calibration method.

Erik Lilienblum, Ayoub Al-Hamadi, Bernd Michaelis

Confidence-Based Surface Prior for Energy-Minimization Stereo Matching

This paper presents a novel confidence-based surface prior for energy minimization formulations of dense stereo matching. Given a dense disparity estimation we fit planes, in disparity space, to regions of the image. For each pixel, the probability of its depth lying on an object plane is modeled as a Gaussian distribution, whose variance is determined using the confidence from a previous matching. We then recalculate a new disparity estimation with the addition of our novel confidence-based surface prior. The process is then repeated. Unlike many region-based methods, our method defines an energy formulation over pixels, instead of regions in a segmentation; this results in a decreased sensitivity to the quality of the initial segmentation. Our confidence-based surface prior differs from existing surface constraints in that it varies the per-pixel strength of the constraint to be proportional to the confidence in our given disparity estimation. The addition of our surface prior has three main benefits: sharp object-boundary edges in areas of depth discontinuity; accurate disparity in surface regions; and low sensitivity to segmentation. We evaluate our method using Middlebury stereo sets and more challenging remote sensing data. Our experimental results demonstrate that our approach has superior performance on these data sets.

Ke Zhu, Daniel Neilson, Pablo d’Angelo

A Monte Carlo Strategy to Integrate Detection and Model-Based Face Analysis

We present a novel probabilistic approach for fitting a statistical model to an image. A 3D Morphable Model (3DMM) of faces is interpreted as a generative (Top-Down) Bayesian model. Random Forests are used as noisy detectors (Bottom-Up) for the face and facial landmark positions. The Top-Down and Bottom-Up parts are then combined using a Data-Driven Markov Chain Monte Carlo Method (DDMCMC). As core of the integration, we use the Metropolis-Hastings algorithm which has two main advantages. First, the algorithm can handle unreliable detections and therefore does not need the detectors to take an early and possible wrong hard decision before fitting. Second, it is open for integration of various cues to guide the fitting process. Based on the proposed approach, we implemented a completely automatic, pose and illumination invariant face recognition application. We are able to train and test the building blocks of our application on different databases. The system is evaluated on the Multi-PIE database and reaches state of the art performance.

Sandro Schönborn, Andreas Forster, Bernhard Egger, Thomas Vetter

Scale-Aware Object Tracking with Convex Shape Constraints on RGB-D Images

Convex relaxation techniques have become a popular approach to a variety of image segmentation problems as they allow to compute solutions independent of the initialization. In this paper, we propose a novel technique for the segmentation of RGB-D images using convex function optimization. The function that we propose to minimize considers both the color image and the depth map for finding the optimal segmentation. We extend the objective function by moment constraints, which allow to include prior knowledge on the 3D center, surface area or volume of the object in a principled way. As we show in this paper, the relaxed optimization problem is convex, and thus can be minimized in a globally optimal way leading to high-quality solutions independent of the initialization. We validated our approach experimentally on four different datasets, and show that using both color and depth substantially improves segmentation compared to color or depth only. Further, 3D moment constraints significantly robustify segmentation which proves in particular useful for object tracking.

Maria Klodt, Jürgen Sturm, Daniel Cremers

Learning How to Combine Internal and External Denoising Methods

Different methods for image denoising have complementary strengths and can be combined to improve image denoising performance, as has been noted by several authors [11,7]. Mosseri et al. [11] distinguish between internal and external methods depending whether they exploit internal or external statistics [13]. They also propose a rule-based scheme (PatchSNR) to combine these two classes of algorithms. In this paper, we test the underlying assumptions and show that many images might not be easily split into regions where internal methods or external methods are preferable. Instead we propose a learning based approach using a neural network, that automatically combines denoising results from an internal and from an external method. This approach outperforms both other combination methods and state-of-the-art stand-alone image denoising methods, hereby further closing the gap to the theoretically achievable performance limits of denoising [9]. Our denoising results can be replicated with a publicly available toolbox.

Harold Christopher Burger, Christian Schuler, Stefan Harmeling

A Comparison of Directional Distances for Hand Pose Estimation

Benchmarking methods for 3d hand tracking is still an open problem due to the difficulty of acquiring ground truth data. We introduce a new dataset and benchmarking protocol that is insensitive to the accumulative error of other protocols. To this end, we create testing frame pairs of increasing difficulty and measure the pose estimation error separately for each of them. This approach gives new insights and allows to accurately study the performance of each feature or method without employing a full tracking pipeline. Following this protocol, we evaluate various directional distances in the context of silhouette-based 3d hand tracking, expressed as special cases of a generalized Chamfer distance form. An appropriate parameter setup is proposed for each of them, and a comparative study reveals the best performing method in this context.

Dimitrios Tzionas, Juergen Gall

Approximate Sorting

Keeping items

in order

is at the essence of organizing information. This paper derives an information-theoretic method for

approximate sorting

. It is optimal in the sense that it extracts as much reliable order information as possible from possibly noisy comparison input data.

The information-theoretic method for approximate sorting is based on approximation sets for a sorting cost function. It optimizes the tradeoff between localizing a set of solutions in a solution space and “robustifying” solution sets against noise in the comparisons. The method is founded on the maximum approximation capacity principle [3,4]. The validity of the new method and its superior rank prediction capability are demonstrated by sorting experiments on real world data.

Ludwig Busse, Morteza Haghir Chehreghani, Joachim M. Buhmann

Sequential Gaussian Mixture Models for Two-Level Conditional Random Fields

Conditional Random Fields are among the most popular techniques for image labelling because of their flexibility in modelling dependencies between the labels and the image features. This paper addresses the problem of efficient classification of partially occluded objects. For this purpose we propose a novel Gaussian Mixture Model based on a sequential training procedure, in combination with multi-level CRF-framework. Our approach is evaluated on urban aerial images. It is shown to increase the classification accuracy in occluded areas by up to 14,4%.

Sergey Kosov, Franz Rottensteiner, Christian Heipke

Synthesizing Real World Stereo Challenges

Synthetic datasets for correspondence algorithm benchmarking recently gained more and more interest. The primary aim in its creation commonly has been to achieve highest possible realism for human observers which is regularly assumed to be the most important design target. But datasets must look realistic to the algorithm, not to the human observer. Therefore, we challenge the realism hypothesis in favor of posing specific, isolated and non-photorealistic problems to algorithms. There are three benefits: (i) Images can be created in large numbers at low cost. This addresses the currently largest problem in ground truth generation. (ii) We can combinatorially iterate through the design space to explore situations of highest relevance to the application. With increasing robustness of future stereo algorithms, datasets can be modified to increase matching challenges gradually. (iii) By isolating the core problems of stereo methods we can focus on each of them in turn. Our aim is not to produce a new dataset. Instead, we contribute with a new perspective on synthetic vision benchmark generation and show encouraging examples to validate our ideas. We believe that the potential of using synthetic data for evaluation in computer vision has not yet been fully utilized. Our first experiments demonstrate it is worthwhile to setup purpose designed datasets, as typical stereo failure can readily be reproduced, and thereby be better understood. Datasets are made available online [1].

Ralf Haeusler, Daniel Kondermann

Pedestrian Path Prediction with Recursive Bayesian Filters: A Comparative Study

In the context of intelligent vehicles, we perform a comparative study on recursive Bayesian filters for pedestrian path prediction at short time horizons (< 2

). We consider Extended Kalman Filters (EKF) based on single dynamical models and Interacting Multiple Models (IMM) combining several such basic models (constant velocity/acceleration/turn). These are applied to four typical pedestrian motion types (crossing, stopping, bending in, starting). Position measurements are provided by an external state-of-the-art stereo vision-based pedestrian detector. We investigate the accuracy of position estimation and path prediction, and the benefit of the IMMs vs. the simpler single dynamical models. Special care is given to the proper sensor modeling and parameter optimization. The dataset and evaluation framework are made public to facilitate benchmarking.

Nicolas Schneider, Dariu M. Gavrila

An Improved Model for Estimating the Meteorological Visibility from a Road Surface Luminance Curve

Hautière

et al

. [8] presented a model to describe the road surface luminance curve. Fitting this model to the reality allowed them to measure atmospheric parameters and, in turn, the meteorological visibility. We introduce a more complex and appropriate model based on the theory of radiative transfer. Through modeling the inscattered light by Koschmieder’s Law and assuming a simple scene geometry we can find a correspondence between the extinction coefficient of the atmosphere and the inflection point of the luminance curve. Contrary to the work of Hautière

et al

., this relation cannot be formulated explicitly. Nevertheless an approach based on look-up tables allows us to utilize the improved model for a low-cost algorithm.

Stephan Lenor, Bernd Jähne, Stefan Weber, Ulrich Stopper

Performance Evaluation of Narrow Band Methods for Variational Stereo Reconstruction

Convex relaxation techniques allow computing optimal or near-optimal solutions for a variety of multilabel problems in computer vision. Unfortunately, they are quite demanding in terms of memory and computation time making them unpractical for large-scale problems. In this paper, we systematically evaluate to what extent narrow band methods can be employed in order to improve the performance of variational multilabel optimization methods. We review variational methods, we present a narrow band formulation and demonstrate with a number of quantitative experiments that the narrow band formulation leads to a reduction in memory and computation time by orders of magnitude while preserving almost the same quality of results. In particular, we show that this formulation allows computing stereo depth maps for 6 Mpixels aerial image pairs on a single GPU in around one minute.

Franz Stangl, Mohamed Souiai, Daniel Cremers

Discriminative Detection and Alignment in Volumetric Data

In this paper, we aim for detection and segmentation of

Arabidopsis thaliana

cells in volumetric image data. To this end, we cluster the training samples by their size and aspect ratio and learn a detector and a shape model for each cluster. While the detector yields good cell hypotheses, additionally aligning the shape model to the image allows to better localize the detections and to reconstruct the cells in case of low quality input data. We show that due to the more accurate localization, the alignment also improves the detection performance.

Dominic Mai, Philipp Fischer, Thomas Blein, Jasmin Dürr, Klaus Palme, Thomas Brox, Olaf Ronneberger

Distances Based on Non-rigid Alignment for Comparison of Different Object Instances

Comparison of different object instances is hard due to the large intra-class variability. Part of this variability is due to viewpoint and pose, another due to subcategories and texture. The variability due to mild viewpoint changes, can be normalized out by aligning the samples. In contrast to the classical Procrustes distance, we propose distances based on non-rigid alignment and show that this increases performance in nearest neighbor tasks. We also investigate which matching costs and which optimization techniques are most appropriate in this context.

Benjamin Drayer, Thomas Brox

Young Researchers’ Forum

Stixel-Based Target Existence Estimation under Adverse Conditions

Vision-based environment perception is particularly challenging in bad weather. Under such conditions, even most powerful stereo algorithms suffer from highly correlated, ”blob”-like noise, that is hard to model. In this paper we focus on extending an existing stereo-based scene representation – the Stixel World – to allow its application even under problematic conditions. To this end, we estimate the probability of existence for each detected obstacle. Results show that the amount of false detections can be reduced significantly by demanding temporal consistency of the representation and by analyzing cues that represent the geometry of typical obstacles.

Timo Scharwächter

Three-Dimensional Data Compression with Anisotropic Diffusion

In 2-D image compression, recent approaches based on image inpainting with edge-enhancing anisotropic diffusion (EED) rival the transform-based quasi-standards JPEG and JPEG 2000 and are even able to surpass it. In this paper, we extend successful concepts from these 2-D methods to the 3-D setting, thereby establishing the first PDE-based 3-D image compression algorithm. This codec uses a cuboidal subdivision strategy to select and efficiently store a small set of sparse image data and reconstructs missing image parts with EED-based inpainting. An evaluation on real-world medical data substantiates the superior performance of this new algorithm in comparison to 2-D inpainting methods and the compression standard DICOM for medical data.

Pascal Peter

Automatic Level Set Based Cerebral Vessel Segmentation and Bone Removal in CT Angiography Data Sets

Computed tomography angiography (CTA) data sets without hardware based bone subtraction have the disadvantage of containing the bone structures which particularly overlap with vessel intensities; therefore vessel segmentation is hampered. Segmentation methods developed for CTA without bones can not handle these data sets and manual cerebral vessel segmentation is not realizable in clinical routines. Therefore, an automatic intensity based cerebral bone removal with subsequent edge based level set vessel segmentation method is presented in this work.

Stephanie Behrens

Action Recognition with HOG-OF Features

In this paper a simple and efficient framework for single human action recognition is proposed. In two parallel processing streams, motion information and static object appearances are gathered by introducing a frame-by-frame learning approach. For each processing stream a Random Forest classifier is separately learned. The final decision is determined by combining both probability functions. The proposed recognition system is evaluated on the KTH data set for single human action recognition with original training/testing splits and a 5-fold cross validation. The results demonstrate state-of-the-art accuracies with an overall training time of 30 seconds on a standard workstation.

Florian Baumann

Image Based 6-DOF Camera Pose Estimation with Weighted RANSAC 3D

In this work an approach for image based 6-DOF pose estimation, with respect to a given 3D point cloud model, is presented. We use 3D annotated training views of the model from which we extract natural 2D features, which can be matched to the query image 2D features. In the next step typically the Perspective-N-Point Problem in combination with the popular RANSAC algorithm on the given 2D-3D point correspondences is used, to estimate the 6-D pose of the camera in respect to the model. We propose a novel extension of the RANSAC algorithm, named

w-RANSAC 3D

, which uses known 3D information to weight each match individually. The evaluation shows that w-RANSAC 3D leads to a more robust pose estimation while needing significantly less iterations.

Johannes Wetzel

Symmetry-Based Detection and Diagnosis of DCIS in Breast MRI

Detecting early stage breast cancers like Ductal Carcinoma In Situ (DCIS) is important, as it supports effective and minimally invasive treatments. Although Computer Aided Detection/Diagnosis (CADe/ CADx) systems have been successfully employed for highly malignant carcinomas, their performance on DCIS is inadequate. In this context, we propose a novel approach combining symmetry, kinetics and morphology that achieves superior performance. We base our work on contrast enhanced data of 18 pure DCIS cases with hand annotated lesions and 9 purely normal cases. The overall sensitivity and specificity of the system stood at 89% each.

Abhilash Srikantha

Statistical Methods and Learning

Ordinal Random Forests for Object Detection

In this paper, we present a novel formulation of Random Forests, which introduces order statistics into the splitting functions of nodes. Order statistics, in general, neglect the absolute values of single feature dimensions and just consider the ordering of different feature dimensions. Recent works showed that such statistics have more discriminative power than just observing single feature dimensions. However, they were just used as a preprocessing step, transforming data into a higher dimensional feature space, or were limited to just consider two feature dimensions. In contrast, we integrate order statistics into the Random Forest framework, and thus avoid explicit mapping onto higher dimensional spaces. In this way, we can also exploit more than two feature dimensions, resulting in increased discriminative power. Moreover, we show that this idea can easily be extended for the popular Hough Forest framework. The experimental results demonstrate that using splitting functions building on order statistics can improve both, the performance for classification tasks (using Random Forests) and for object detection (using Hough Forests).

Samuel Schulter, Peter M. Roth, Horst Bischof

Revisiting Loss-Specific Training of Filter-Based MRFs for Image Restoration

It is now well known that Markov random fields (MRFs) are particularly effective for modeling image priors in low-level vision. Recent years have seen the emergence of two main approaches for learning the parameters in MRFs: (1) probabilistic learning using sampling-based algorithms and (2) loss-specific training based on MAP estimate. After investigating existing training approaches, it turns out that the performance of the loss-specific training has been significantly underestimated in existing work. In this paper, we revisit this approach and use techniques from bi-level optimization to solve it. We show that we can get a substantial gain in the final performance by solving the lower-level problem in the bi-level framework with high accuracy using our newly proposed algorithm. As a result, our trained model is on par with highly specialized image denoising algorithms and clearly outperforms probabilistically trained MRF models. Our findings suggest that for the loss-specific training scheme, solving the lower-level problem with higher accuracy is beneficial. Our trained model comes along with the additional advantage, that inference is extremely efficient. Our GPU-based implementation takes less than 1s to produce state-of-the-art performance.

Yunjin Chen, Thomas Pock, René Ranftl, Horst Bischof

Labeling Examples That Matter: Relevance-Based Active Learning with Gaussian Processes

Active learning is an essential tool to reduce manual annotation costs in the presence of large amounts of unsupervised data. In this paper, we introduce new active learning methods based on measuring the impact of a new example on the current model. This is done by deriving model changes of Gaussian process models in closed form. Furthermore, we study typical pitfalls in active learning and show that our methods automatically balance between the exploitation and the exploration trade-off. Experiments are performed with established benchmark datasets for visual object recognition and show that our new active learning techniques are able to outperform state-of-the-art methods.

Alexander Freytag, Erik Rodner, Paul Bodesheim, Joachim Denzler

Efficient Retrieval for Large Scale Metric Learning

In this paper, we address the problem of efficient k-NN classification. In particular, in the context of Mahalanobis metric learning. Mahalanobis metric learning recently demonstrated competitive results for a variety of tasks. However, such approaches have two main drawbacks. First, learning metrics requires often to solve complex and thus computationally very expensive optimization problems. Second, as the evaluation time linearly scales with the size of the data k-NN becomes cumbersome for large-scale problems or real-time applications with limited time budget. To overcome these problems, we propose a metric-based hashing strategy, allowing for both, efficient learning and evaluation. In particular, we adopt an efficient metric learning method for local sensitive hashing that recently demonstrated reasonable results for several large-scale benchmarks. In fact, if the intrinsic structure of the data is exploited by the metric in a meaningful way, using hashing we can compact the feature representation still obtaining competitive results. This leads to a drastically reduced evaluation effort. Results on a variety of challenging benchmarks with rather diverse nature demonstrate the power of our method. These include standard machine learning datasets as well as the challenging Public Figures Face Database. On the competitive machine learning benchmarks we obtain results comparable to the state-of-the-art Mahalanobis metric learning and hashing approaches. On the face benchmark we clearly outperform the state-of-the-art in Mahalanobis metric learning. In both cases, however, with drastically reduced evaluation effort.

Martin Köstinger, Peter M. Roth, Horst Bischof

Applications

A Hierarchical Voxel Hash for Fast 3D Nearest Neighbor Lookup

We propose a data structure for finding the exact nearest neighbors in 3D in approximately

(log(log(

)) time. In contrast to standard approaches such as

-d-trees, the query time is independent of the location of the query point and the distribution of the data set. The method uses a hierarchical voxel approximation of the data point’s Voronoi cells. This avoids backtracking during the query phase, which is a typical action for tree-based methods such as

-d-trees. In addition, voxels are stored in a hash table and a bisection on the voxel level is used to find the leaf voxel containing the query point. This is asymptotically faster than letting the query point fall down the tree. The experiments show the method’s high performance compared to state-of-the-art approaches even for large point sets, independent of data and query set distributions, and illustrates its advantage in real-world applications.

Bertram Drost, Slobodan Ilic

Bone Age Assessment Using the Classifying Generalized Hough Transform

A theoretical description and experimental validation of the Classifying Generalized Hough Transform (CGHT) is presented. This general image classification technique is based on a discriminative training procedure that jointly estimates concurrent class-dependent shape models for usage in a GHT voting procedure. The basic approach is extended by a coarse-to-fine classification strategy and a simple classifier combination technique for a combined decision on several regions of interest in a given image. The framework is successfully applied to the task of automatic bone age assessment and produces comparable results to other state-of-the-art techniques on a public database. For the most difficult age range of 9 to 16 years the automatic system achieves a mean error of 0.8 years compared to the average rating of two physicians. Unlike most other image classification techniques, the trained CGHT models can be visually interpreted, unveiling the most relevant anatomical structures for class discrimination.

Ferdinand Hahmann, Inga Berger, Heike Ruppertshofen, Thomas Deserno, Hauke Schramm

Framework for Generation of Synthetic Ground Truth Data for Driver Assistance Applications

High precision ground truth data is a very important factor for the development and evaluation of computer vision algorithms and especially for advanced driver assistance systems. Unfortunately, some types of data, like accurate optical flow and depth as well as pixel-wise semantic annotations are very difficult to obtain.

In order to address this problem, in this paper we present a new framework for the generation of high quality synthetic camera images, depth and optical flow maps and pixel-wise semantic annotations. The framework is based on a realistic driving simulator called VDrift [1], which allows us to create traffic scenarios very similar to those in real life.

We show how we can use the proposed framework to generate an extensive dataset for the task of multi-class image segmentation. We use the dataset to train a pairwise CRF model and to analyze the effects of using various combinations of features in different image modalities.

Vladimir Haltakov, Christian Unger, Slobodan Ilic

Refractive Plane Sweep for Underwater Images

In underwater imaging, refraction changes the geometry of image formation, causing the perspective camera model to be invalid. Hence, a systematic model error occurs when computing 3D models using the perspective camera model. This paper deals with the problem of computing dense depth maps of underwater scenes with explicit incorporation of refraction of light at the underwater housing. It is assumed that extrinsic, intrinsic, and housing parameters have been calibrated for all cameras. Due to the refractive camera’s characteristics it is not possible to directly apply epipolar geometry or rectification to images because the single-view-point model and, consequently, homographies are invalid. Additionally, the projection of 3D points into the camera cannot be computed efficiently, but requires solving a 12

degree polynomial. Therefore, the method proposed is an adapted plane sweep algorithm that is based on the idea of back-projecting rays for each pixel and view onto the 3D-hypothesis planes using the GPU. This allows to efficiently warp all images onto the plane, where they can be compared. Consequently, projections of 3D points and homographies are not utilized.

Anne Jordt-Sedlazeck, Daniel Jung, Reinhard Koch

Optical Flow

An Evaluation of Data Costs for Optical Flow

Motion estimation in realistic outdoor settings is significantly challenged by cast shadows, reflections, glare, saturation, automatic gain control,

etc

. To allow robust optical flow estimation in these cases, it is important to choose appropriate data cost functions for matching. Recent years have seen a growing trend toward patch-based data costs, as they are already common in stereo. Systematic evaluations of different costs in the context of optical flow have been limited to certain cost types, and carried out on data without challenging appearance. In this paper, we contribute a systematic evaluation of various pixel- and patch-based data costs using a state-of-the-art algorithmic testbed and the realistic KITTI dataset as basis. Akin to previous findings in stereo, we find the Census transformation to be particularly suitable for challenging real-world scenes.

Christoph Vogel, Stefan Roth, Konrad Schindler

Illumination Robust Optical Flow Model Based on Histogram of Oriented Gradients

The brightness constancy assumption has widely been used in variational optical flow approaches as their basic foundation. Unfortunately, this assumption does not hold when illumination changes or for objects that move into a part of the scene with different brightness conditions. This paper proposes a variation of the L1-norm dual total variational (TV-L1) optical flow model with a new illumination-robust data term defined from the histogram of oriented gradients computed from two consecutive frames. In addition, a weighted non-local term is utilized for denoising the resulting flow field. Experiments with complex textured images belonging to different scenarios show results comparable to state-of-the-art optical flow models, although being significantly more robust to illumination changes.

Hatem A. Rashwan, Mahmoud A. Mohamed, Miguel Angel García, Bärbel Mertsching, Domenec Puig

Pattern Recognition

Spatial Pattern Templates for Recognition of Objects with Regular Structure

We propose a method for semantic parsing of images with regular structure. The structured objects are modeled in a densely connected CRF. The paper describes how to embody specific spatial relations in a representation called

Spatial Pattern Templates

(SPT), which allows us to capture regularity constraints of alignment and equal spacing in pairwise and ternary potentials.

Assuming the input image is pre-segmented to salient regions the SPT describe which segments could interact in the structured graphical model. The model parameters are learnt to describe the formal language of semantic labelings. Given an input image, a consistent labeling over its segments linked in the CRF is recognized as a word from this language.

The CRF framework allows us to apply efficient algorithms for both recognition and learning. We demonstrate the approach on the problem of facade image parsing and show that results comparable with state of the art methods are achieved without introducing additional manually designed detectors for specific terminal objects.

Radim Tyleček, Radim Šára

K-Smallest Spanning Tree Segmentations

Real-world images often admit many different segmentations that have nearly the same quality according to the underlying energy function. The diversity of these solutions may be a powerful uncertainty indicator. We provide the crucial prerequisite in the context of seeded segmentation with minimum spanning trees (i.e. edge-weighted watersheds). Specifically, we show how to efficiently enumerate the k smallest spanning trees that result in

different

segmentations; and we prove that solutions are indeed found in the correct order. Experiments show that about half of the trees considered by our algorithm represent unique segmentations. This redundancy is orders of magnitude lower than can be achieved by just enumerating the

-smallest MSTs, making the algorithm viable in practice.

Christoph Straehle, Sven Peter, Ullrich Köthe, Fred A. Hamprecht

Discriminable Points That Stick Out of Their Environment

In this paper, we introduce BoSP (Bonn Salient Points), a method comprising a pair of a keypoint detector and descriptor in image data that are deeply geared to one another. Our detector identifies points of interest to be local maxima of appearance contrast to their surroundings in a statistical manner. This criterion admits a selection of particularly repeatable, but diverse looking keypoints. Besides, those textural statistics collected around a keypoint location directly serve as its descriptor. An important component in this framework is how to gather and represent local statistics. Regarding this, we further improved the efficient ML-estimation procedure for multivariate normal distributions previously introduced by Klein and Frintrop [6]. This Gaussian representation of feature statistics enables a quickly computable, closed-form solution of the

$\mathcal{W}_2$

-distance, which we utilize as a measure of appearance contrast. Evaluations were conducted comparing several recent detector/descriptor pairs on a well-recognized, publicly available dataset.

Dominik Alexander Klein, Armin Bernd Cremers

Representation Learning for Cloud Classification

Proper cloud segmentation can serve as an important precursor to predicting the output of solar power plants. However, due to the high variability of cloud appearance, and the high dynamic range between different sky regions, cloud segmentation is a surprisingly difficult task.

In this paper, we present an approach to cloud segmentation and classification that is based on representation learning. Texture primitives of cloud regions are represented within a restricted Boltzmann Machine. Quantitative results are encouraging. Experimental results yield a relative improvement of the unweighted average (pixelwise) precision on a three-class problem by 11% to 94% in comparison to prior work.

David Bernecker, Christian Riess, Vincent Christlein, Elli Angelopoulou, Joachim Hornegger

Shape Recognition and Scene Understanding

CopyMe3D: Scanning and Printing Persons in 3D

In this paper, we describe a novel approach to create 3D miniatures of persons using a Kinect sensor and a 3D color printer. To achieve this, we acquire color and depth images while the person is rotating on a swivel chair. We represent the model with a signed distance function which is updated and visualized as the images are captured for immediate feedback. Our approach automatically fills small holes that stem from self-occlusions. To optimize the model for 3D printing, we extract a watertight but hollow shell to minimize the production costs. In extensive experiments, we evaluate the quality of the obtained models as a function of the rotation speed, the non-rigid deformations of a person during recording, the camera pose, and the resulting self-occlusions. Finally, we present a large number of reconstructions and fabricated figures to demonstrate the validity of our approach.

Jürgen Sturm, Erik Bylow, Fredrik Kahl, Daniel Cremers

Monocular Pose Capture with a Depth Camera Using a Sums-of-Gaussians Body Model

We present a new markerless generative approach for Human Motion Tracking using a single depth camera. It is based on a Sums of Spatial Gaussians (SoGs) representation for modeling the scene. In contrast to existing systems our approach does not require a multi-view camera setup, exemplar database or training data. The proposed system is accurate, fast and capable of tracking complex motions including 360° turns and self-occlusion of limited duration. The motivation behind our approach is that representing the depth data and a given a priori human model by a SoGs, we can construct an efficient continuously differentiable similarity measure and estimate an optimal pose for each input frame using a local optimization algorithm (Modified Gradient Ascent Linear Search, MGALS).

Daniyar Kurmankhojayev, Nils Hasler, Christian Theobalt

Robust Realtime Motion-Split-And-Merge for Motion Segmentation

In this paper, we analyze and modify the Motion-Split-and-Merge (

MSAM

) algorithm [3] for the motion segmentation of correspondences between two frames. Our goal is to make the algorithm suitable for practical use which means realtime processing speed at very low error rates. We compare our (robust realtime)

RMSAM

with J-Linkage [16] and Graph-Based Segmentation [5] and show that it is superior to both. Applying

RMSAM

in a multi-frame motion segmentation context to the Hopkins 155 benchmark, we show that compared to the original formulation, the error decreases from 2.05% to only 0.65% at a runtime reduced by 72%. The error is still higher than the best results reported so far, but

RMSAM

is dramatically faster and can handle outliers and missing data.

Ralf Dragon, Jörn Ostermann, Luc Van Gool

Efficient Multi-cue Scene Segmentation

This paper presents a novel multi-cue framework for scene segmentation, involving a combination of appearance (grayscale images) and depth cues (dense stereo vision). An efficient 3D environment model is utilized to create a small set of meaningful free-form region hypotheses for object location and extent. Those regions are subsequently categorized into several object classes using an extended multi-cue bag-of-features pipeline. For that, we augment grayscale bag-of-features by

bag-of-depth-features

operating on dense disparity maps, as well as

height pooling

to incorporate a 3D geometric ordering into our region descriptor.

In experiments on a large real-world stereo vision data set, we obtain state-of-the-art segmentation results at significantly reduced computational costs. Our dataset is made public for benchmarking purposes.

Timo Scharwächter, Markus Enzweiler, Uwe Franke, Stefan Roth

Backmatter

Titel: Pattern Recognition
herausgegeben von: Joachim Weickert
Matthias Hein
Bernt Schiele
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-40602-7
Print ISBN: 978-3-642-40601-0
DOI: https://doi.org/10.1007/978-3-642-40602-7