Skip to main content

Über dieses Buch

On behalf of the organizing committee, we would like to welcome you to Da- nd stadt and DAGM 2010,the 32 Annual Symposium of the German Association for Pattern Recognition. The technical program covered all aspects of pattern recognition and, to name only a few areas, ranged from 3D reconstruction, to object recognition and medical applications. The result is re?ected in these proceedings, which contain the papers presented at DAGM 2010. Our call for papers resulted in 134 submissions from institutions in 21 countries. Each paper underwent a rigorous reviewing process and was assigned to at least three program committee m- bers for review. The reviewing phase was followed by a discussion phase among the respective program committee members in order to suggest papers for - ceptance. The ?nal decision was taken during a program committee meeting held in Darmstadt based on all reviews, the discussion results and, if necessary, additional reviewing. Based on this rigorous process we selected a total of 57 papers, corresponding to an acceptance rate of below 45%. Out of all accepted papers, 24 were chosen for oral and 33 for poster presentation. All accepted - pers have been published in these proceedings and given the same number of pages. We would like to thank all members of the program committee as well as the external reviewers for their valuable and highly appreciated contribution to the community.



Geometry and Calibration

3D Reconstruction Using an n-Layer Heightmap

We present a novel method for 3D reconstruction of urban scenes extending a recently introduced heightmap model. Our model has several advantages for 3D modeling of urban scenes: it naturally enforces vertical surfaces, has no holes, leads to an efficient algorithm, and is compact in size. We remove the major limitation of the heightmap by enabling modeling of overhanging structures. Our method is based on an an


-layer heightmap with each layer representing a surface between full and empty space. The configuration of layers can be computed optimally using a dynamic programming method. Our cost function is derived from probabilistic occupancy, and incorporates the Bayesian Information Criterion (BIC) for selecting the number of layers to use at each pixel. 3D surface models are extracted from the heightmap. We show results from a variety of datasets including Internet photo collections. Our method runs on the GPU and the complete system processes video at 13 Hz.

David Gallup, Marc Pollefeys, Jan-Michael Frahm

Real-Time Dense Geometry from a Handheld Camera

We present a novel variational approach to estimate dense depth maps from multiple images in real-time. By using robust penalizers for both data term and regularizer, our method preserves discontinuities in the depth map. We demonstrate that the integration of multiple images substantially increases the robustness of estimated depth maps to noise in the input images. The integration of our method into recently published algorithms for camera tracking allows dense geometry reconstruction in real-time using a single handheld camera. We demonstrate the performance of our algorithm with real-world data.

Jan Stühmer, Stefan Gumhold, Daniel Cremers

From Single Cameras to the Camera Network: An Auto-Calibration Framework for Surveillance

This paper presents a stratified auto-calibration framework for typical large surveillance set-ups including non-overlapping cameras. The framework avoids the need of any calibration target and purely relies on visual information coming from walking people. Since in non-overlapping scenarios there are no point correspondences across the cameras the standard techniques cannot be employed. We show how to obtain a fully calibrated camera network starting from single camera calibration and bringing the problem to a reduced form suitable for multi-view calibration. We extend the standard bundle adjustment by a smoothness constraint to avoid the ill-posed problem arising from missing point correspondences. The proposed framework optimizes the objective function in a stratified manner thus suppressing the problem of local minima. Experiments with synthetic and real data validate the approach.

Cristina Picus, Branislav Micusik, Roman Pflugfelder

Active Self-calibration of Multi-camera Systems

We present a method for actively calibrating a multi-camera system consisting of pan-tilt zoom cameras. After a coarse initial calibration, we determine the probability of each relative pose using a probability distribution based on the camera images. The relative poses are optimized by rotating and zooming each camera pair in a way that significantly simplifies the problem of extracting correct point correspondences. In a final step we use active camera control, the optimized relative poses, and their probabilities to calibrate the complete multi-camera system with a minimal number of relative poses. During this process we estimate the translation scales in a camera triangle using only two of the three relative poses and no point correspondences. Quantitative experiments on real data outline the robustness and accuracy of our approach.

Marcel Brückner, Joachim Denzler

Poster Session I

Optimization on Shape Curves with Application to Specular Stereo

We state that a one-dimensional manifold of shapes in 3-space can be modeled by a level set function. Finding a minimizer of an independent functional among all points on such a shape curve has interesting applications in computer vision. It is shown how to replace the commonly encountered practice of gradient projection by a projection onto the curve itself. The outcome is an algorithm for constrained optimization, which, as we demonstrate theoretically and numerically, provides some important benefits in stereo reconstruction of specular surfaces.

Jonathan Balzer, Sebastian Höfer, Stefan Werling, Jürgen Beyerer

Unsupervised Facade Segmentation Using Repetitive Patterns

We introduce a novel approach for separating and segmenting individual facades from streetside images. Our algorithm incorporates prior knowledge about arbitrarily shaped repetitive regions which are detected using intensity profile descriptors and a voting–based matcher. In the experiments we compare our approach to extended state–of–the–art matching approaches using more than 600 challenging streetside images, including different building styles and various occlusions. Our algorithm outperforms these approaches and allows to correctly separate 94% of the facades. Pixel–wise comparison to our ground–truth yields a segmentation accuracy of 85%. According to these results our work is an important contribution to fully automatic building reconstruction.

Andreas Wendel, Michael Donoser, Horst Bischof

Image Segmentation with a Statistical Appearance Model and a Generic Mumford-Shah Inspired Outside Model

We present a novel statistical-model-based segmentation algorithm that addresses a recurrent problem in appearance model fitting and model-based segmentation: the “shrinking problem”. When statistical appearance models are fitted to an image in order to segment an object, they have the tendency not to cover the full object, leaving a gap between the real and the detected boundary. This is due to the fact that the cost function for fitting the model is evaluated only on the inside of the object and the gap at the boundary is not detected. The state-of-the-art approach to overcome this shrinking problem is to detect the object edges in the image and force the model to adhere to these edges. Here, we introduce a region-based approach motivated by the Mumford-Shah functional that does not require the detection of edges. In addition to the appearance model, we define a generic model estimated from the input image for the outside of the appearance model. Shrinking is prevented because a misaligned boundary would create a large discrepancy between the image and the inside/outside model. The method is independent of the dimensionality of the image. We apply it to 3-dimensional CT images.

Thomas Albrecht, Thomas Vetter

Estimating Force Fields of Living Cells – Comparison of Several Regularization Schemes Combined with Automatic Parameter Choice

In this paper we evaluate several regularization schemes applied to the problem of force estimation, that is Traction Force Microscopy (TFM). This method is widely used to investigate cell adhesion and migration processes as well as cellular response to mechanical and chemical stimuli. To estimate force densities TFM requires the solution of an inverse problem, a deconvolution. Two main approaches have been established for this. The method introduced by Dembo [1] makes a finite element approach and inverts the emerging LES by means of regularization. Hence this method is very robust, but requires high computational effort. The other ansatz by Butler [2] works in Fourier space to solve the problem by direct inversion. It is therefore based on the assumption of smooth data with little noise. The combination of both, a regularization in Fourier space, has been proposed [3] but not analyzed in detail. We cover this analysis and present several methods for an objective and automatic choice of the required regularization parameters.

Sebastian Houben, Norbert Kirchgeßner, Rudolf Merkel

Classification of Microorganisms via Raman Spectroscopy Using Gaussian Processes

Automatic categorization of microorganisms is a complex task which requires advanced techniques to achieve accurate performance. In this paper, we aim at identifying microorganisms based on Raman spectroscopy. Empirical studies over the last years show that powerful machine learning methods such as Support Vector Machines (SVMs) are suitable for this task. Our work focuses on the Gaussian process (GP) classifier which is new to this field, provides fully probabilistic outputs and allows for efficient hyperparameter optimization. We also investigate the incorporation of prior knowledge regarding possible signal variations where known concepts from invariant kernel theory are transferred to the GP framework. In order to validate the suitability of the GP classifier, a comparison with state-of-the-art learners is conducted on a large-scale Raman spectra dataset, showing that the GP classifier significantly outperforms all other tested classifiers including SVM. Our results further show that incorporating prior knowledge leads to a significant performance gain when small amounts of training data are used.

Michael Kemmler, Joachim Denzler, Petra Rösch, Jürgen Popp

Robust Identification of Locally Planar Objects Represented by 2D Point Clouds under Affine Distortions

The matching of point sets that are characterized only by their geometric configuration is a challenging problem. In this paper, we present a novel point registration algorithm for robustly identifying objects represented by two dimensional point clouds under affine distortions. We make no assumptions about the initial orientation of the point clouds and only incorporate the geometric configuration of the points to recover the affine transformation that aligns the parts that originate from the same locally planar surface of the three dimensional object. Our algorithm can deal well with noise and outliers and is inherently robust against partial occlusions. It is in essence a GOODSAC approach based on geometric hashing to guess a good initial affine transformation that is iteratively refined in order to retrieve a characteristic common point set with minimal squared error. We successfully apply it for the biometric identification of the bluespotted ribbontail ray

Taeniura lymma


Dominic Mai, Thorsten Schmidt, Hans Burkhardt

Model-Based Recognition of Domino Tiles Using TGraphs

This paper presents a case study showing that domino tile recognition using a model-based approach delivers results comparable to heuristic or statistical approaches. The knowledge on our models is modeled in TGraphs which are typed, attributed, and ordered directed graphs. Four task-independent rules are defined to create a domain independent control strategy which manages the object recognition. To perform the matching of elements found in the image and elements given by the model, a large number of hypotheses may arise. We designed several belief functions in terms of Dempster-Shafer in order to rate hypotheses emerging from the assignment of image to model elements. The developed system achieves a recall of 89.4% and a precision of 94.4%. As a result we are able to show that model based object recognition is on a competitive basis with the benefit of knowing the belief in each model. This enables the possibility to apply our techniques to more complex domains again, as it was tried and canceled 10 years ago.

Stefan Wirtz, Marcel Häselich, Dietrich Paulus

Slicing the View: Occlusion-Aware View-Based Robot Navigation

Optical Rails [1] is a purely view-based method for autonomous track following with a mobile robot, based upon compact omnidirectional view descriptors using basis functions on the sphere.We address the most prominent points of criticism towards holistic methods for robot navigation: Dealing with occlusions and varying illumination. This is accomplished by slicing the omnidirectional view into segments, enabling dynamic visual fields capable of masking out occlusions while preserving proven, efficient paradigms for holistic view comparison and steering.

David Dederscheck, Martin Zahn, Holger Friedrich, Rudolf Mester

A Contour Matching Algorithm to Reconstruct Ruptured Documents

A procedure for reassembling ruptured documents from a large number of fragments is proposed. Such problems often arises in forensic and archiving. Usually, fragments are mixed and take arbitrary shapes. The proposed procedure concentrates on contour information of the fragments and represents it as feature strings to perform a matching based on dynamic programming. Experiments with 500 images of randomly shredded fragments show that the proposed reconstruction procedure is able to compose nearly 98% of the ruptured pages.

Anke Stieber, Jan Schneider, Bertram Nickolay, Jörg Krüger

Local Structure Analysis by Isotropic Hilbert Transforms

This work presents the isotropic and orthogonal decomposition of 2D signals into local geometrical and structural components. We will present the solution for 2D image signals in four steps: signal modeling in scale space, signal extension by higher order generalized Hilbert transforms, signal representation in classical matrix form, followed by the most important step, in which the matrix-valued signal will be mapped to a so called multivector. We will show that this novel multivector-valued signal representation is an interesting space for complete geometrical and structural signal analysis. In practical computer vision applications lines, edges, corners, and junctions as well as local texture patterns can be analyzed in one unified algebraic framework. Our novel approach will be applied to parameter-free multilayer decomposition.

Lennart Wietzke, Oliver Fleischmann, Anne Sedlazeck, Gerald Sommer

Complex Motion Models for Simple Optical Flow Estimation

The selection of an optical flow method is mostly a choice from among accuracy, efficiency and ease of implementation. While variational approaches tend to be more accurate than local parametric methods, much algorithmic effort and expertise is often required to obtain comparable efficiency with the latter. Through the exploitation of natural motion statistics, the estimation of optical flow from local parametric models yields a good alternative. We show that learned, linear, parametric models capture specific higher order relations between neighboring flow vectors and, thus, allow for complex, spatio-temporal motion patterns despite a simple and efficient implementation. The method comes with an inherent confidence measure, and the motion models can easily be adapted to specific applications with typical motion patterns by choice of training data. The proposed approach can be understood as a generalization of the original structure tensor approach to the incorporation of arbitrary linear motion models. In this way accuracy, specificity, efficiency and ease of implementation can be achieved at the same time.

Claudia Nieuwenhuis, Daniel Kondermann, Christoph S. Garbe

Tracking People in Broadcast Sports

We present a method for tracking people in monocular broadcast sports videos by coupling a particle filter with a vote-based confidence map of athletes, appearance features and optical flow for motion estimation. The confidence map provides a continuous estimate of possible target locations in each frame and outperforms tracking with discrete target detections. We demonstrate the tracker on sports videos, tracking fast and articulated movements of athletes such as divers and gymnasts and on non-sports videos, tracking pedestrians in a PETS2009 sequence.

Angela Yao, Dominique Uebersax, Juergen Gall, Luc Van Gool

A Template-Based Approach for Real-Time Speed-Limit-Sign Recognition on an Embedded System Using GPU Computing

We present a template-based pipeline that performs realtime speed-limit-sign recognition using an embedded system with a lowend GPU as the main processing element. Our pipeline operates in the frequency domain, and uses nonlinear composite filters and a contrastenhancing preprocessing step to improve its accuracy. Running at interactive rates, our system achieves 90% accuracy over 120 EU speed-limit signs on 45 minutes of video footage, superior to the 75% accuracy of a non-real-time GPU-based SIFT pipeline.

Pınar Muyan-Özçelik, Vladimir Glavtchev, Jeffrey M. Ota, John D. Owens

Inpainting in Multi-image Stereo

In spite of numerous works on inpainting, there has been little work addressing both image and structure inpainting. In this work, we propose a new method for inpainting both image and depth of a scene using multiple stereo images. The observations contain unwanted artifacts, which can be possibly caused due to sensor/lens damage or occluders. In such a case, all the observations contain missing regions which are stationary with respect to the image coordinate system. We exploit the fact that the information missing in some images may be present in other images due to the motion cue. This includes the correspondence information for depth estimation/inpainting as well as the color information for image inpainting. We establish our approaches in the belief propagation (BP) framework which also uses the segmentation cue for estimation/inpainting of depth maps.

Arnav V. Bhavsar, Ambasamudram N. Rajagopalan

Analysis of Length and Orientation of Microtubules in Wide-Field Fluorescence Microscopy

In this paper we present a novel approach for the analysis of microtubules in wide-field fluorescence microscopy. Microtubules are flexible elongated structures and part of the cytoskeleton, a cytoplasmic scaffolding responsible for cell stability and motility. The method allows for precise measurements of microtubule length and orientation under different conditions despite a high variability of image data and in the presence of artefacts. Application of the proposed method to demonstrate the effect of the protein GAR22 on the rate of polymerisation of microtubules illustrates the potential of our approach.

Gerlind Herberich, Anca Ivanescu, Ivonne Gamper, Antonio Sechi, Til Aach

Learning Non-stationary System Dynamics Online Using Gaussian Processes

Gaussian processes are a powerful non-parametric framework for solving various regression problems. In this paper, we address the task of learning a Gaussian process model of non-stationary system dynamics in an online fashion. We propose an extension to previous models that can appropriately handle outdated training samples by decreasing their influence onto the predictive distribution. The resulting model estimates for each sample of the training set an individual noise level and thereby produces a mean shift towards more reliable observations. As a result, our model improves the prediction accuracy in the context of non-stationary function approximation and can furthermore detect outliers based on the resulting noise level. Our approach is easy to implement and is based upon standard Gaussian process techniques. In a real-world application where the task is to learn the system dynamics of a miniature blimp, we demonstrate that our algorithm benefits from individual noise levels and outperforms standard methods.

Axel Rottmann, Wolfram Burgard

Computational TMA Analysis and Cell Nucleus Classification of Renal Cell Carcinoma

We consider an automated processing pipeline for tissue micro array analysis (TMA) of renal cell carcinoma. It consists of several consecutive tasks, which can be mapped to machine learning challenges. We investigate three of these tasks, namely nuclei segmentation, nuclei classification and staining estimation. We argue for a holistic view of the processing pipeline, as it is not obvious whether performance improvements at individual steps improve overall accuracy. The experimental results show that classification accuracy, which is comparable to trained human experts, can be achieved by using support vector machines (SVM) with appropriate kernels. Furthermore, we provide evidence that the shape of cell nuclei increases the classification performance. Most importantly, these improvements in classification accuracy result in corresponding improvements for the medically relevant estimation of immunohistochemical staining.

Peter J. Schüffler, Thomas J. Fuchs, Cheng Soon Ong, Volker Roth, Joachim M. Buhmann


Efficient Object Detection Using Orthogonal NMF Descriptor Hierarchies

Recently descriptors based on Histograms of Oriented Gradients (HOG) and Local Binary Patterns (LBP) have shown excellent results in object detection considering the precision as well as the recall. However, since these descriptors are based on high dimensional representations such approaches suffer from enormous memory and runtime requirements. The goal of this paper is to overcome these problems by introducing hierarchies of orthogonal Non-negative Matrix Factorizations (NMF). In fact, in this way a lower dimensional feature representation can be obtained without loosing the discriminative power of the original features. Moreover, the hierarchical structure allows to represent parts of patches on different scales allowing for a more robust classification. We show the effectiveness of our approach for two publicly available datasets and compare it to existing state-of-the-art methods. In addition, we demonstrate it in context of aerial imagery, where high dimensional images have to be processed requiring efficient methods.

Thomas Mauthner, Stefan Kluckner, Peter M. Roth, Horst Bischof

VF-SIFT: Very Fast SIFT Feature Matching

Feature-based image matching is one of the most fundamental issues in computer vision tasks. As the number of features increases, the matching process rapidly becomes a bottleneck. This paper presents a novel method to speed up SIFT feature matching. The main idea is to extend SIFT feature by a few pairwise independent angles, which are invariant to rotation, scale and illumination changes. During feature extraction, SIFT features are classified based on their introduced angles into different clusters and stored in multidimensional table. Thus, in feature matching, only SIFT features that belong to clusters, where correct matches may be expected are compared. The performance of the proposed methods was tested on two groups of images, real-world stereo images and standard dataset images, through comparison with the performances of two state of the arte algorithms for ANN searching, hierarchical k-means and randomized kd-trees. The presented experimental results show that the performance of the proposed method extremely outperforms the two other considered algorithms. The experimental results show that the feature matching can be accelerated about 1250 times with respect to exhaustive search without losing a noticeable amount of correct matches.

Faraj Alhwarin, Danijela Ristić–Durrant, Axel Gräser

One-Shot Learning of Object Categories Using Dependent Gaussian Processes

Knowledge transfer from related object categories is a key concept to allow learning with few training examples. We present how to use dependent Gaussian processes for transferring knowledge from a related category in a non-parametric Bayesian way. Our method is able to select this category automatically using efficient model selection techniques. We show how to optionally incorporate semantic similarities obtained from the hierarchical lexical database WordNet [1] into the selection process. The framework is applied to image categorization tasks using state-of-the-art image-based kernel functions. A large scale evaluation shows the benefits of our approach compared to independent learning and a SVM based approach.

Erik Rodner, Joachim Denzler

Learning and Optimization

Uncertainty Driven Multi-scale Optimization

This paper proposes a new multi-scale energy minimization algorithm which can be used to efficiently solve large scale labelling problems in computer vision. The basic modus operandi of any multi-scale method involves the construction of a smaller problem which can be solved efficiently. The solution of this problem is used to obtain a partial labelling of the original energy function, which in turn allows us to minimize it by solving its (much smaller) projection. We propose the use of new techniques for both the construction of the smaller problem, and the extraction of a partial solution. Experiments on image segmentation show that our techniques give solutions with low pixel labelling error and in the same or less amount of computation time, compared to traditional multi-scale techniques.

Pushmeet Kohli, Victor Lempitsky, Carsten Rother

The Group-Lasso: ℓ1, ∞  Regularization versus ℓ1,2 Regularization

The ℓ

1, ∞ 

norm and the ℓ


norm are well known tools for joint regularization in Group-Lasso methods. While the ℓ


version has been studied in detail, there are still open questions regarding the uniqueness of solutions and the efficiency of algorithms for the ℓ

1, ∞ 

variant. For the latter, we characterize the conditions for uniqueness of solutions, we present a simple test for uniqueness, and we derive a highly efficient active set algorithm that can deal with input dimensions in the millions. We compare both variants of the Group-Lasso for the two most common application scenarios of the Group-Lasso, one is to obtain sparsity on the level of groups in “standard” prediction problems, the second one is


learning where the aim is to solve many learning problems in parallel which are coupled via the Group-Lasso constraint. We show that both version perform quite similar in “standard” applications. However, a very clear distinction between the variants occurs in multi-task settings where the ℓ


version consistently outperforms the ℓ

1, ∞ 

counterpart in terms of prediction accuracy.

Julia E. Vogt, Volker Roth

Random Fourier Approximations for Skewed Multiplicative Histogram Kernels

Approximations based on random Fourier features have recently emerged as an efficient and elegant methodology for designing large-scale kernel machines [4]. By expressing the kernel as a Fourier expansion, features are generated based on a finite set of random basis projections with inner products that are Monte Carlo approximations to the original kernel. However, the original Fourier features are only applicable to translation-invariant kernels and are not suitable for histograms that are always non-negative. This paper extends the concept of translation-invariance and the random Fourier feature methodology to arbitrary, locally compact Abelian groups. Based on empirical observations drawn from the exponentiated



kernel, the state-of-the-art for histogram descriptors, we propose a new group called the

skewed-multiplicative group

and design translation-invariant kernels on it. Experiments show that the proposed kernels outperform other kernels that can be similarly approximated. In a semantic segmentation experiment on the PASCAL VOC 2009 dataset, the approximation allows us to train large-scale learning machines more than two orders of magnitude faster than previous nonlinear SVMs.

Fuxin Li, Catalin Ionescu, Cristian Sminchisescu

Gaussian Mixture Modeling with Gaussian Process Latent Variable Models

Density modeling is notoriously difficult for high dimensional data. One approach to the problem is to search for a lower dimensional manifold which captures the main characteristics of the data. Recently, the Gaussian Process Latent Variable Model (GPLVM) has successfully been used to find low dimensional manifolds in a variety of complex data. The GPLVM consists of a set of points in a low dimensional latent space, and a stochastic map to the observed space. We show how it can be interpreted as a density model in the observed space. However, the GPLVM is not trained as a density model and therefore yields bad density estimates. We propose a new training strategy and obtain improved generalisation performance and better density estimates in comparative evaluations on several benchmark data sets.

Hannes Nickisch, Carl Edward Rasmussen


Classification of Swimming Microorganisms Motion Patterns in 4D Digital In-Line Holography Data

Digital in-line holography is a 3D microscopy technique which has gotten an increasing amount of attention over the last few years in the fields of microbiology, medicine and physics. In this paper we present an approach for automatically classifying complex microorganism motions observed with this microscopy technique. Our main contribution is the use of Hidden Markov Models (HMMs) to classify four different motion patterns of a microorganism and to separate multiple patterns occurring within a trajectory. We perform leave-one-out experiments with the training data to prove the accuracy of our method and to analyze the importance of each trajectory feature for classification. We further present results obtained on four full sequences, a total of 2500 frames. The obtained classification rates range between 83.5% and 100%.

Laura Leal-Taixé, Matthias Heydt, Sebastian Weiße, Axel Rosenhahn, Bodo Rosenhahn

Catheter Tracking: Filter-Based vs. Learning-Based

Atrial fibrillation is the most common sustained arrhythmia. One important treatment option is radio-frequency catheter ablation (RFCA) of the pulmonary veins attached to the left atrium. RFCA is usually performed under fluoroscopic (X-ray) image guidance. Overlay images computed from pre-operative 3-D volumetric data can be used to add anatomical detail otherwise not visible under X-ray. Unfortunately, current fluoro overlay images are static, i.e., they do not move synchronously with respiratory and cardiac motion. A filter-based catheter tracking approach using simultaneous biplane fluoroscopy was previously presented. It requires localization of a circumferential tracking catheter, though. Unfortunately, the initially proposed method may fail to accommodate catheters of different size. It may also detect wrong structures in the presence of high background clutter. We developed a new learning-based approach to overcome both problems. First, a 3-D model of the catheter is reconstructed. A cascade of boosted classifiers is then used to segment the circumferential mapping catheter. Finally, the 3-D motion at the site of ablation is estimated by tracking the reconstructed model in 3-D from biplane fluoroscopy. We compared our method to the previous approach using 13 clinical data sets and found that the 2-D tracking error improved from 1.0 mm to 0.8 mm. The 3-D tracking error was reduced from 0.8 mm to 0.7 mm.

Alexander Brost, Andreas Wimmer, Rui Liao, Joachim Hornegger, Norbert Strobel

Exploiting Redundancy for Aerial Image Fusion Using Convex Optimization

Image fusion in high-resolution aerial imagery poses a challenging problem due to fine details and complex textures. In particular, color image fusion by using virtual orthographic cameras offers a common representation of overlapping yet perspective aerial images. This paper proposes a variational formulation for a tight integration of redundant image data showing urban environments. We introduce an efficient wavelet regularization which enables a natural-appearing recovery of fine details in the images by performing joint inpainting and denoising from a given set of input observations. Our framework is first evaluated on a setting with synthetic noise. Then, we apply our proposed approach to orthographic image generation in aerial imagery. In addition, we discuss an exemplar-based inpainting technique for an integrated removal of non-stationary objects like cars.

Stefan Kluckner, Thomas Pock, Horst Bischof

Poster Session II

A Convex Approach for Variational Super-Resolution

We propose a convex variational framework to compute high resolution images from a low resolution video. The image formation process is analyzed to provide to a well designed model for warping, blurring, downsampling and regularization. We provide a comprehensive investigation of the single model components. The super-resolution problem is modeled as a minimization problem in an unified convex framework, which is solved by a fast primal dual algorithm. A comprehensive evaluation on the influence of different kinds of noise is carried out. The proposed algorithm shows excellent recovery of information for various real and synthetic datasets.

Markus Unger, Thomas Pock, Manuel Werlberger, Horst Bischof

Incremental Learning in the Energy Minimisation Framework for Interactive Segmentation

In this article we propose a method for parameter learning within the energy minimisation framework for segmentation. We do this in an incremental way where user input is required for resolving segmentation ambiguities. Whereas most other interactive learning approaches focus on learning appearance characteristics only, our approach is able to cope with learning prior terms; in particular the Potts terms in binary image segmentation. The artificial as well as real examples illustrate the applicability of the approach.

Denis Kirmizigül, Dmitrij Schlesinger

A Model-Based Approach to the Segmentation of Nasal Cavity and Paranasal Sinus Boundaries

We present a model-driven approach to the segmentation of nasal cavity and paranasal sinus boundaries. Based on computed tomography data of a patients head, our approach aims to extract the border that separates the structures of interest from the rest of the head. This three-dimensional region information is useful in many clinical applications, e.g. diagnosis, surgical simulation, surgical planning and robot assisted surgery. The desired boundary can be made up of bone, mucosa or air what makes the segmentation process very difficult and brings traditional segmentation approaches, like e.g. region growing, to their limits. Motivated by the work of Tsai

et al.

[1] and Leventon

et al.

[2], we therefore show how a parametric level-set model can be generated from hand-segmented nasal cavity and paranasal sinus data that gives us the ability to transform the complex segmentation problem into a finite-dimensional one. On this basis, we propose a processing chain for the automated segmentation of the endonasal structures that incorporates the model information and operates without any user interaction. Promising results are obtained by evaluating our approach on two-dimensional data slices of 50 patients with very diverse paranasal sinuses.

Carsten Last, Simon Winkelbach, Friedrich M. Wahl, Klaus W. G. Eichhorn, Friedrich Bootz

Wavelet-Based Inpainting for Object Removal from Image Series

We propose several algorithmic extensions to inpainting that have been proposed to the spatial domain by other authors and apply them to an inpainting technique in the wavelet domain. We also introduce a new merging stage. We show how these techniques can be used to remove large objects in complex outdoor scenes automatically. We evaluate our approach quantitatively against the aforementioned inpainting methods and show that our extensions measurably increase the inpainting quality.

Sebastian Vetter, Marcin Grzegorzek, Dietrich Paulus

An Empirical Comparison of Inference Algorithms for Graphical Models with Higher Order Factors Using OpenGM

Graphical models with higher order factors are an important tool for pattern recognition that has recently attracted considerable attention. Inference based on such models is challenging both from the view point of software design and optimization theory. In this article, we use the new C++ template library OpenGM to empirically compare inference algorithms on a set of synthetic and real-world graphical models with higher order factors that are used in computer vision. While inference algorithms have been studied intensively for graphical models with second order factors, an empirical comparison for higher order models has so far been missing. This article presents a first set of experiments that intends to fill this gap.

Björn Andres, Jörg H. Kappes, Ullrich Köthe, Christoph Schnörr, Fred A. Hamprecht

N-View Human Silhouette Segmentation in Cluttered, Partially Changing Environments ,

The segmentation of foreground silhouettes of humans in camera images is a fundamental step in many computer vision and pattern recognition tasks. We present an approach which, based on color distributions, estimates the foreground by automatically integrating data driven 3d scene knowledge from multiple static views. These estimates are integrated into a level set approach to provide the final segmentation results. The advantage of the presented approach is that ambiguities based on color distributions of the fore- and background can be resolved in many cases utilizing the integration of implicitly extracted 3d scene knowledge and 2d boundary constraints. The presented approach is thereby able to automatically handle cluttered scenes as well as scenes with partially changing backgrounds and changing light conditions.

Tobias Feldmann, Björn Scheuermann, Bodo Rosenhahn, Annika Wörner

Nugget-Cut: A Segmentation Scheme for Spherically- and Elliptically-Shaped 3D Objects

In this paper, a segmentation method for spherically- and elliptically-shaped objects is presented. It utilizes a user-defined seed point to set up a directed 3D graph. The nodes of the 3D graph are obtained by sampling along rays that are sent through the surface points of a polyhedron. Additionally, several arcs and a parameter constrain the set of possible segmentations and enforce smoothness. After the graph has been constructed, the minimal cost closed set on the graph is computed via a polynomial time s-t cut, creating an optimal segmentation of the object. The presented method has been evaluated on 50 Magnetic Resonance Imaging (MRI) data sets with World Health Organization (WHO) grade IV gliomas (glioblastoma multiforme). The ground truth of the tumor boundaries were manually extracted by three clinical experts (neurological surgeons) with several years (> 6) of experience in resection of gliomas and afterwards compared with the automatic segmentation results of the proposed scheme yielding an average Dice Similarity Coefficient (DSC) of 80.37±8.93%. However, no segmentation method provides a perfect result, so additional editing on some slices was required, but these edits could be achieved quickly because the automatic segmentation provides a border that fits mostly to the desired contour. Furthermore, the manual segmentation by neurological surgeons took 2-32 minutes (mean: 8 minutes), in contrast to the automatic segmentation with our implementation that took less than 5 seconds.

Jan Egger, Miriam H. A. Bauer, Daniela Kuhnt, Barbara Carl, Christoph Kappus, Bernd Freisleben, Christopher Nimsky

Benchmarking Stereo Data (Not the Matching Algorithms)

Current research in stereo image analysis focuses on improving matching algorithms in terms of accuracy, computational costs, and robustness towards real-time applicability for complex image data and 3D scenes. Interestingly, performance testing takes place for a huge number of algorithms, but, typically, on very small sets of image data only. Even worse, there is little reasoning whether data as commonly applied is actually suitable to prove robustness or even correctness of a particular algorithm. We argue for the need of testing stereo algorithms on a much broader variety of image data then done so far by proposing a simple measure for putting image stereo data of different quality into relation to each other. Potential applications include purpose-directed decisions for the selection of image stereo data for testing the applicability of matching techniques under particular situations, or for realtime estimation of stereo performance (without any need for providing ground truth) in cases where techniques should be selected depending on the given situation.

Ralf Haeusler, Reinhard Klette

Robust Open-Set Face Recognition for Small-Scale Convenience Applications

In this paper, a robust real-world video based open-set face recognition system is presented. This system is designed for general small-scale convenience applications, which can be used for providing customized services. In the developed prototype, the system identifies a person in question and conveys customized information according to the identity. Since it does not require any cooperation of the users, the robustness of the system can be easily affected by the confounding factors. To overcome the pose problem, we generated frontal view faces with a tracked 2D face model. We also employed a distance metric to assess the quality of face model tracking. A local appearance-based face representation was used to make the system robust against local appearance variations. We evaluated the system’s performance on a face database which was collected in front of an office. The experimental results on this database show that the developed system is able to operate robustly under real-world conditions.

Hua Gao, Hazım Kemal Ekenel, Rainer Stiefelhagen

Belief Propagation for Improved Color Assessment in Structured Light

Single-Shot Structured Light is a well-known method for acquiring 3D surface data of moving scenes with simple and compact hardware setups. Some of the biggest challenges in these systems is their sensitivity to textured scenes, subsurface scattering and low-contrast illumination. Recently, a graph-based method has been proposed that largely eliminates these shortcomings. A key step in the graph-based pattern decoding algorithm is the estimation of color of local image regions which correspond to the vertex colors of the graph. In this work we propose a new method for estimating the color of a vertex based on belief propagation (BP). The BP framework allows the explicit inclusion of cues from neigboring vertices in the color estimation. This is especially beneficial for low-contrast input images. The augmented method is evaluated using typical low-quality real-world test sequences of the interior of a pig stomach. We demonstrate a significant improvement in robustness. The number of 3D data points generated increases by 30 to 50 percent over the plain decoding.

Christoph Schmalz, Elli Angelopoulou

3D Object Detection Using a Fast Voxel-Wise Local Spherical Fourier Tensor Transformation

In this paper we present a novel approach for expanding spherical 3D-tensor fields of arbitrary order in terms of a tensor valued local Fourier basis. For an efficient implementation, a two step approach is suggested combined with the use of spherical derivatives. Based on this new transformation we conduct two experiments utilizing the spherical tensor algebra for computing and using rotation invariant features for object detection and classification. The first experiment covers the successful detection of non-spherical root cap cells of Arabidopsis root tips presented in volumetric microscopical recordings. The second experiment shows how to use these features for successfully detecting


−helices in cryo-EM density maps of secondary protein structures, leading to very promising results.

Henrik Skibbe, Marco Reisert, Thorsten Schmidt, Klaus Palme, Olaf Ronneberger, Hans Burkhardt

Matte Super-Resolution for Compositing

Super-resolution of the alpha matte and the foreground object from a video are jointly attempted in this paper. Instead of super-resolving them independently, we treat super-resolution of the matte and foreground in a combined framework, incorporating the matting equation in the image degradation model. We take multiple adjacent frames from a low-resolution video with non-global motion for increasing the spatial resolution. This ill-posed problem is regularized by employing a Bayesian restoration approach, wherein the high-resolution image is modeled as a Markov Random Field. In matte super-resolution, it is particularly important to preserve fine details at the boundary pixels between the foreground and background. For this purpose, we use a discontinuity-adaptive smoothness prior to include observed data in the solution. This framework is useful in video editing applications for compositing low-resolution objects into high-resolution videos.

Sahana M. Prabhu, Ambasamudram N. Rajagopalan

An Improved Histogram of Edge Local Orientations for Sketch-Based Image Retrieval

Content-based image retrieval requires a natural image (e.g, a photo) as query, but the absence of such a query image is usually the reason for a search. An easy way to express the user query is using a line-based hand-drawing, a sketch, leading to the sketch-based image retrieval. Few authors have addressed image retrieval based on a sketch as query, and the current approaches still keep low performance under scale, translation, and rotation transformations. In this paper, we describe a method based on computing efficiently a histogram of edge local orientations that we call HELO. Our method is based on a strategy applied in the context of fingerprint processing. This descriptor is invariant to scale and translation transformations. To tackle the rotation problem, we apply two normalization processes, one using principal component analysis and the other using polar coordinates. Finally, we linearly combine two distance measures. Our results show that HELO significantly increases the retrieval effectiveness in comparison with the state of the art.

Jose M. Saavedra, Benjamin Bustos

A Novel Curvature Estimator for Digital Curves and Images

We propose a novel curvature estimation algorithm which is capable of estimating the curvature of digital curves and two-dimensional curved image structures. The algorithm is based on the conformal projection of the curve or image signal to the two-sphere. Due to the geometric structure of the embedded signal the curvature may be estimated in terms of first order partial derivatives in ℝ


. This structure allows us to obtain the curvature by just convolving the projected signal with the appropriate kernels. We show that the method performs an implicit plane fitting by convolving the projected signals with the derivative kernels. Since the algorithm is based on convolutions its implementation is straightforward for digital curves as well as images. We compare the proposed method with differential geometric curvature estimators. It turns out that the novel estimator is as accurate as the standard differential geometric methods in synthetic as well as real and noise perturbed environments.

Oliver Fleischmann, Lennart Wietzke, Gerald Sommer

Local Regression Based Statistical Model Fitting

Fitting statistical models is a widely employed technique for the segmentation of medical images. While this approach gives impressive results for simple structures, shape models are often not flexible enough to accurately represent complex shapes. We present a fitting approach, which increases the model fitting accuracy without requiring a larger training data-set. Inspired by a local regression approach known from statistics, our method fits the full model to a neighborhood around each point of the domain. This increases the model’s flexibility considerably without the need to introduce an artificial segmentation of the structure. By adapting the size of the neighborhood from small to large, we can smoothly interpolate between localized fits, which accurately map the data but are more prone to noise, and global fits, which are less flexible but constrained to valid shapes only. We applied our method for the segmentation of teeth from 3D cone-beam ct-scans. Our experiments confirm that our method consistently increases the precision of the segmentation result compared to a standard global fitting approach.

Matthias Amberg, Marcel Lüthi, Thomas Vetter

Semi-supervised Learning of Edge Filters for Volumetric Image Segmentation

For every segmentation task, prior knowledge about the object that shall be segmented has to be incorporated. This is typically performed either automatically by using labeled data to train the used algorithm, or by manual adaptation of the algorithm to the specific application. For the segmentation of 3D data, the generation of training sets is very tedious and time consuming, since in most cases, an expert has to mark the object boundaries in all slices of the 3D volume. To avoid this, we developed a new framework that combines unsupervised and supervised learning. First, the possible edge appearances are grouped, such that, in the second step, the expert only has to choose between relevant and non-relevant clusters. This way, even objects with very different edge appearances in different regions of the boundary can be segmented, while the user interaction is limited to a very simple operation. In the presented work, the chosen edge clusters are used to generate a filter for all relevant edges. The filter response is used to generate an edge map based on which an active surface segmentation is performed. The evaluation on the segmentation of plant cells recorded with 3D confocal microscopy yields convincing results.

Margret Keuper, Robert Bensch, Karsten Voigt, Alexander Dovzhenko, Klaus Palme, Hans Burkhardt, Olaf Ronneberger


Geometrically Constrained Level Set Tracking for Automotive Applications

We propose a new approach for integrating geometric scene knowledge into a level-set tracking framework. Our approach is based on a novel constrained-homography transformation model that restricts the deformation space to physically plausible rigid motion on the ground plane. This model is especially suitable for tracking vehicles in automotive scenarios. Apart from reducing the number of parameters in the estimation, the 3D transformation model allows us to obtain additional information about the tracked objects and to recover their detailed 3D motion and orientation at every time step. We demonstrate how this information can be used to improve a Kalman filter estimate of the tracked vehicle dynamics in a higher-level tracker, leading to more accurate object trajectories. We show the feasibility of this approach for an application of tracking cars in an inner-city scenario.

Esther Horbert, Dennis Mitzel, Bastian Leibe

Interactive Motion Segmentation

Interactive motion segmentation is an important task for scene understanding and analysis. Despite recent progress state-of-the-art approaches still have difficulties in adapting to the diversity of spatially varying motion fields. Due to strong, spatial variations of the motion field, objects are often divided into several parts. At the same time, different objects exhibiting similar motion fields often cannot be distinguished correctly. In this paper, we propose to use spatially varying affine motion model parameter distributions combined with minimal guidance via user drawn scribbles. Hence, adaptation to motion pattern variations and capturing subtle differences between similar regions is feasible. The idea is embedded in a variational minimization problem, which is solved by means of recently proposed convex relaxation techniques. For two regions (i.e. object and background) we obtain globally optimal results for this formulation. For more than two regions the results deviate within very small bounds of about 2 to 4 % from the optimal solution in our experiments. To demonstrate the benefit of using both model parameters and spatially variant distributions, we show results for challenging synthetic and real-world motion fields.

Claudia Nieuwenhuis, Benjamin Berkels, Martin Rumpf, Daniel Cremers

On-Line Multi-view Forests for Tracking

A successful approach to tracking is to on-line learn discriminative classifiers for the target objects. Although these tracking-by-detection approaches are usually fast and accurate they easily drift in case of putative and self-enforced wrong updates. Recent work has shown that classifier-based trackers can be significantly stabilized by applying semi-supervised learning methods instead of supervised ones. In this paper, we propose a novel on-line multi-view learning algorithm based on random forests. The main idea of our approach is to incorporate multiview learning inside random forests and update each tree with individual label estimates for the unlabeled data. Our method is fast, easy to implement, benefits from parallel computing architectures and inherently exploits multiple views for learning from unlabeled data. In the tracking experiments, we outperform the state-of-the-art methods based on boosting and random forests.

Christian Leistner, Martin Godec, Amir Saffari, Horst Bischof

Low-Level Vision and Features

Probabilistic Multi-class Scene Flow Segmentation for Traffic Scenes

A multi-class traffic scene segmentation approach based on scene flow data is presented. Opposed to many other approaches using color or texture features, our approach is purely based on dense depth and 3D motion information. Using prior knowledge on tracked objects in the scene and the pixel-wise uncertainties of the scene flow data, each pixel is assigned to either a particular

moving object

class (tracked/unknown object), the ground surface, or static background. The global topological order of classes, such as

objects are above ground

, is locally integrated into a conditional random field by an ordering constraint. The proposed method yields very accurate segmentation results on challenging real world scenes, which we made publicly available for comparison.

Alexander Barth, Jan Siegemund, Annemarie Meißner, Uwe Franke, Wolfgang Förstner

A Stochastic Evaluation of the Contour Strength

If one considers only local neighborhoods for segmenting an image, one gets contours whose strength is often poorly estimated. A method for reevaluating the contour strength by taking into account non local features is presented: one generates a fixed number of random germs which serve as markers for the watershed segmentation. For each new population of markers, another set of contours is generated. ”Important” contours are selected more often. The present paper shows that the probability that a contour is selected can be estimated without performing the effective simulations.

Fernand Meyer, Jean Stawiaski

Incremental Computation of Feature Hierarchies

Feature hierarchies are essential to many visual object recognition systems and are well motivated by observations in biological systems. The present paper proposes an algorithm to incrementally compute feature hierarchies. The features are represented as estimated densities, using a variant of local soft histograms. The kernel functions used for this estimation in conjunction with their unitary extension establish a tight frame and results from framelet theory apply. Traversing the feature hierarchy requires resampling of the spatial and the feature bins. For the resampling, we derive a multi-resolution scheme for quadratic spline kernels and we derive an optimization algorithm for the upsampling. We complement the theoretic results by some illustrative experiments, consideration of convergence rate and computational efficiency.

Michael Felsberg

From Box Filtering to Fast Explicit Diffusion

There are two popular ways to implement anisotropic diffusion filters with a diffusion tensor: Explicit finite difference schemes are simple but become inefficient due to severe time step size restrictions, while semi-implicit schemes are more efficient but require to solve large linear systems of equations. In our paper we present a novel class of algorithms that combine the advantages of both worlds: They are based on simple explicit schemes, while being more efficient than semi-implicit approaches. These so-called fast explicit diffusion (FED) schemes perform cycles of explicit schemes with varying time step sizes that may violate the stability restriction in up to 50 percent of all cases. FED schemes can be motivated from a decomposition of box filters in terms of explicit schemes for linear diffusion problems. Experiments demonstrate the advantages of the FED approach for time-dependent (parabolic) image enhancement problems as well as for steady state (elliptic) image compression tasks. In the latter case FED schemes are speeded up substantially by embedding them in a cascadic coarse-to-fine approach.

Sven Grewenig, Joachim Weickert, Andrés Bruhn

Surfaces and Materials

High-Resolution Object Deformation Reconstruction with Active Range Camera

This contribution discusses the 3D reconstruction of deformable freeform surfaces with high spatial and temporal resolution. These are conflicting requirements, since high-resolution surface scanners typically cannot achieve high temporal resolution, while high-speed range cameras like the Time-of-Flight (ToF) cameras capture depth at 25 fps but have a limited spatial resolution. We propose to combine a high-resolution surface scan with a ToF-camera and a color camera to achieve both requirements. The 3D surface deformation is modeled by a NURBS surface that approximates the object surface and estimates the 3D object motion and local 3D deformation from the ToF and color camera data. A set of few NURBS control points can faithfully model the motion and deformation and will be estimated from the ToF and color data with high accuracy. The contribution will focus on the estimation of the 3D deformation NURBS from the ToF and color data.

Andreas Jordt, Ingo Schiller, Johannes Bruenger, Reinhard Koch

Selection of an Optimal Polyhedral Surface Model Using the Minimum Description Length Principle

In this paper a new approach to find an optimal surface representation is described. It is shown that the minimum description length (MDL) principle can be used to select a trade-off between goodness-of-fit and complexity of decimated mesh representations. A given mesh is iteratively simplified by using different decimation algorithms. At each step the two-part minimum description length is evaluated. The first part encodes all model parameters while the second part encodes the error residuals given the model. A Bayesian approach is used to deduce the MDL term. The shortest code length identifies the optimal trade-off. The method has been successfully tested by various examples.

Tilman Wekel, Olaf Hellwich

Learning of Optimal Illumination for Material Classification

We present a method to classify materials in illumination series data. An illumination series is acquired using a device which is capable to generate arbitrary lighting environments covering nearly the whole space of the upper hemisphere. The individual images of the illumination series span a high-dimensional feature space. Using a random forest classifier different materials, which vary in appearance (which itself depends on the patterns of incoming illumination), can be distinguished reliably. The associated Gini feature importance allows for determining the features which are most relevant for the classification result. By linking the features to illumination patterns a proposition about optimal lighting for defect detection can be made, which yields valuable information for the selection and placement of light sources.

Markus Jehle, Christoph Sommer, Bernd Jähne


Weitere Informationen

Premium Partner