Skip to main content

2005 | Book

Pattern Recognition

27th DAGM Symposium, Vienna, Austria, August 31 - September 2, 2005. Proceedings

Editors: Walter G. Kropatsch, Robert Sablatnig, Allan Hanbury

Publisher: Springer Berlin Heidelberg

Book Series: Lecture Notes in Computer Science


About this book

It is both an honor and a pleasure to hold the 27th Annual Meeting of the German Association for Pattern Recognition, DAGM 2005, at the Vienna U- versity of Technology, Austria, organized by the Pattern Recognition and Image Processing (PRIP) Group. We received 122 contributions of which we were able to accept 29 as oral presentations and 31 as posters. Each paper received three reviews, upon which decisions were made based on correctness, presentation, technical depth, scienti?c signi?cance and originality. The selection as oral or poster presentation does not signify a quality grading but re?ects attractiveness to the audience which is also re?ected in the order of appearance of papers in these proceedings. The papers are printed in the same order as presented at the symposium and posters are integrated in the corresponding thematic session. In putting these proceedings together, many people played signi?cant roles which we would like to acknowledge. First of all our thanks go to the authors who contributed their work to the symposium. Second, we are grateful for the dedicated work of the 38 members of the Program Committee for their e?ort in evaluating the submitted papers and inprovidingthe necessarydecisionsupport information and the valuable feedback for the authors. Furthermore, the P- gram Committee awarded prizes for the best papers, and we want to sincerely thank the donors. We were honored to have the following three invited speakers at the conf- ence: – Jan P.

Table of Contents


Color Analysis

On Determining the Color of the Illuminant Using the Dichromatic Reflection Model

The human visual system is able to accurately determine the color of objects irrespective of the spectral power distribution used to illuminate the scene. This ability to compute color constant descriptors is called color constancy. Many different algorithms have been proposed to solve the problem of color constancy. Usually, some assumptions have to be made in order to solve this problem. Algorithms based on the dichromatic reflection model assume that the light reflected from an object results from a combined matte and specular reflection. This assumption is used to estimate the color of the illuminant. Once the color of the illuminant is known, one can compute a color corrected image as it would appear under a canonical, i.e. white illuminant. A number of different methods can be used to estimate the illuminant from the dichromatic reflection model. We evaluate several different methods on a standard set of images. Our results indicate that the median operator is particularly useful in estimating the color of the illuminant. We also found that it is not advantageous to assume that the illuminant can be approximated by the curve of the black-body radiator.

Marc Ebner, Christian Herrmann
Probabilistic Color Optical Flow

Usually, optical flow computation is based on grayscale images and the brightness conservation assumption. Recently, some authors have investigated in transferring gradient-based grayscale optical flow methods to color images. These color optical flow methods are restricted to brightness and color conservation over time. In this paper, a correlation-based color optical flow method is presented that allows for brightness and color changes within an image sequence. Further on, the correlation results are used for a probabilistic evaluation that combines the velocity information gained from single color frames to a joint velocity estimate including all color frames. The resulting color optical flow is compared to other representative multi-frame color methods and standard single-frame grayscale methods.

Volker Willert, Julian Eggert, Sebastian Clever, Edgar Körner
Illumination Invariant Color Texture Analysis Based on Sum- and Difference-Histograms

Color texture algorithms have been under investigation for quite a few years now. However, the results of these algorithms are still under considerable influence of the illumination conditions under which the images were captured. It is strongly desireable to reduce the influence of illumination as much as possible to obtain stable and satisfying classification results even under difficult imaging conditions, as they can occur e.g. in medical applications like endoscopy. In this paper we present the analysis of a well-known texture analysis algorithm, namely the sum- and difference-histogram features, with respect to illumination changes. Based on this analysis, we propose a novel set of features factoring out the illumination influence from the majority of the original features. We conclude our paper with a quantitative, experimental evaluation on artificial and real image samples.

Christian Münzenmayer, Sylvia Wilharm, Joachim Hornegger, Thomas Wittenberg
Color Image Compression: Early Vision and the Multiresolution Representations

The efficiency of an image compression technique relies on the capability of finding sparse M-terms for best approximation with reduced visually significant quality loss. By ”visually significant” it is meant the information to which human observer can perceive. The






ystem (HVS) is generally sensitive to the contrast, color, spatial frequency...etc. This paper is concerned with the compression of color images where the psycho-visual representation is an important strategy to define the best M-term approximation technique. Digital color images are usually stored using the RGB space, television broadcast uses YUV (YIQ) space while the psycho-visual representation relies on 3 components: one for the luminance and two for the chrominance. In this paper, an analysis of the wavelet and contourlet representation of the color image both in RGB and YUV spaces is performed. A approximation technique is performed in order to investigate the performance of image compression technique using one of those transforms.

Ahmed Nabil Belbachir, Peter Michael Goebel

Stereo Vision

Optic Flow Goes Stereo: A Variational Method for Estimating Discontinuity-Preserving Dense Disparity Maps

We present a novel variational method for estimating dense disparity maps from stereo images. It integrates the epipolar constraint into the currently most accurate optic flow method (Brox

et al.

2004). In this way, a new approach is obtained that offers several advantages compared to existing variational methods: (i) It preservers discontinuities very well due to the use of the total variation as solution-driven regulariser. (ii) It performs favourably under noise since it uses a robust function to penalise deviations from the data constraints. (iii) Its minimisation via a coarse-to-fine strategy can be theoretically justified. Experiments with both synthetic and real-world data show the excellent performance and the noise robustness of our approach.

Natalia Slesareva, Andrés Bruhn, Joachim Weickert
Lens Model Selection for Visual Tracking

A standard approach to generate a camera pose from images of a single moving camera is Structure From Motion (SfM). When aiming on a practical implementation of SfM often a camera is needed that is lightweight and small. This work analyses which is the best camera and lens for SfM, that is small in size and available on the market. Therefore we compare cameras with fisheye and perspective lenses. It is shown that pose estimation is improved by a fisheye lens. Also some methods are discussed, how the large Field of View can be further exploited to improve the pose estimation.

Birger Streckel, Reinhard Koch
Omnidirectional Vision with Frontal Stereo

This study describes a novel imaging system that can be used as a front end for a vision system for guidance of unmanned aerial vehicles. A single camera and a combination of three specially designed reflecting surfaces are used to provide (a) complete omnidirectional vision with no frontal blind zone and (b) stereo range within a frontal field. Important features are (i) the use of a single camera, which, apart from minimising cost, eliminates the need for alignment and calibration of multiple cameras; and (ii) a novel approach to stereoscopic range computation that uses a single camera and a circular baseline to overcome potential aperture problems.

W. Stürzl, M. V. Srinivasan
Stereo Vision Based Reconstruction of Huge Urban Areas from an Airborne Pushbroom Camera (HRSC)

This paper considers the application of capturing urban terrain by an airborne pushbroom camera (e.g. High Resolution Stereo Camera). The resulting images as well as disparity ranges are expected to be huge. A slightly non-linear flight path and small orientation changes are anticipated, which results in curved epipolar lines. These images cannot be geometrically corrected for matching purposes such that epipolar lines are exactly straight and parallel to each other. The proposed novel processing solution explicitely calculates epipolar lines for reducing the disparity search range to a minimum. This is a necessary prerequisite for using an accurate, but memory intensive semi global stereo matching method that is based on pixelwise matching. It is shown that the proposed approach performs accurate matching of urban terrain and is efficient on huge images.

Heiko Hirschmüller, Frank Scholten, Gerd Hirzinger
Calibration–Free Hand–Eye Calibration: A Structure–from–Motion Approach

The paper presents an extended hand-eye calibration approach that, in contrast to the standard method, does not require a calibration pattern for determining camera position and orientation. Instead, a structure-from-motion algorithm is applied for obtaining the eye-data that is necessary for computing the unknown hand-eye transformation. Different ways of extending the standard algorithm are presented, which mainly involves the estimation of a scale factor in addition to rotation and translation. The proposed methods are experimentally compared using data obtained from an optical tracking system that determines the pose of an endoscopic camera. The approach is of special interest in our clinical setup, as the usage of an unsterile calibration pattern is difficult in a sterile environment.

Jochen Schmidt, Florian Vogt, Heinrich Niemann

Invited Paper

Simple Solvers for Large Quadratic Programming Tasks

This paper describes solvers for specific quadratic programming (QP) tasks. The QP tasks in question appear in numerous problems, e.g., classifier learning and probability density estimation. The QP task becomes challenging when large number of variables is to be optimized. This the case common in practice. We propose QP solvers which are simple to implement and still able to cope with problems having hundred thousands variables.

Vojtěch Franc, Václav Hlaváč

Segmentation and Grouping

Voxel-Wise Gray Scale Invariants for Simultaneous Segmentation and Classification

3D volumetric microscopical techniques (e.g. confocal laser scanning microscopy) have become a standard tool in biomedical applications to record three-dimensional objects with highly anisotropic morphology. To analyze these data in high-throughput experiments, reliable, easy to use and generally applicable pattern recognition tools are required. The major problem of nearly all existing applications is their high specialization to exact one problem, and the their time-consuming adaption to new problems, that has to be done by pattern recognition experts. We therefore search for a tool that can be adapted to new problems just by an interactive training process. Our main idea is therefore to combine object segmentation and recognition into one step by computing voxel-wise gray scale invariants (using nonlinear kernel functions and Haar-integration) on the volumetric multi-channel data set and classify each voxel using support vector machines.

After the selection of an appropriate set of nonlinear kernel functions (which allows to integrate previous knowledge, but still needs some expertise), this approach allows a biologist to adapt the recognition system for his problem just by interactively selecting several voxels as training points for each class of objects. Based on these points the classification result is computed and the biologist may refine it by selecting additional training points until the result meets his needs. In this paper we present the theoretical background and a fast approximative algorithm using FFTs for computing Haar-integrals over the very rich class of nonlinear 3-point-kernel functions. The approximation still fulfils the invariance conditions. The experimental application for the recognition of different cell cores of the chorioallantoic membrane is presented in the accompanying paper [1] and in the technical report [2]

Olaf Ronneberger, Janis Fehr, Hans Burkhardt
Automatic Foreground Propagation in Image Sequences for 3D Reconstruction

In this paper we introduce a novel method for automatic propagation of foreground objects in image sequences. Our method is based on a combination of the mean-shift operator with the well known intelligent scissors technique. It is effective due to the fact that the images are captured with high overlap, resulting in highly redundant scene information. The algorithm requires an initial segmentation of one image of the sequence as an input. In each consecutive image the segmentation of the previous image is taken as an initialization and the propagation procedure proceeds along four major steps. Each step refines the segmentation of the foreground object and the algorithm converges until all images of the sequence are processed. We demonstrate the effectiveness of our approach on several datasets.

Mario Sormann, Christopher Zach, Joachim Bauer, Konrad Karner, Horst Bischof
Agglomerative Grouping of Observations by Bounding Entropy Variation

An information theoretic framework for grouping observations is proposed. The entropy change incurred by new observations is analyzed using the Kalman filter update equations. It is found, that the entropy variation is caused by a positive similarity term and a negative proximity term. Bounding the similarity term in the spirit of the minimum description length principle and the proximity term in the spirit of maximum entropy inference a robust and efficient grouping procedure is devised. Some of its properties are demonstrated for the exemplary task of edgel grouping.

Christian Beder
Three-Dimensional Shape Knowledge for Joint Image Segmentation and Pose Estimation

This paper presents the integration of 3D shape knowledge into a variational model for level set based image segmentation and tracking. Having a 3D surface model of an object that is visible in the image of a calibrated camera, the object contour stemming from the segmentation is applied to estimate the 3D pose parameters, whereas the object model projected to the image plane helps in a top-down manner to improve the extraction of the contour and the region statistics. The present approach clearly states all model assumptions in a single energy functional. This keeps the model manageable and allows further extensions for the future. While common alternative segmentation approaches that integrate 2D shape knowledge face the problem that an object can look very different from various viewpoints, a 3D free form model ensures that for each view the model can perfectly fit the data in the image. Moreover, one solves the higher level problem of determining the object pose including its distance to the camera. Experiments demonstrate the performance of the method.

Thomas Brox, Bodo Rosenhahn, Joachim Weickert
Goal-Directed Search with a Top-Down Modulated Computational Attention System

In this paper we present VOCUS: a robust computational attention system for goal-directed search. A standard bottom-up architecture is extended by a top-down component, enabling the weighting of features depending on previously learned weights. The weights are derived from both target (excitation) and background properties (inhibition). A single system is used for bottom-up saliency computations, learning of feature weights, and goal-directed search. Detailed performance results for artificial and real-world images are presented, showing that a target is typically among the first 3 focused regions. VOCUS represents a robust and time-saving front-end for object recognition since by selecting regions of interest it significantly reduces the amount of data to be processed by a recognition system.

Simone Frintrop, Gerriet Backer, Erich Rome

Automatic Speech Understanding

Telephone-Based Speech Dialog Systems

In this contribution we look back on the last years in the history of telephone-based speech dialog systems. We will start in 1993 when the world wide first natural language understanding dialog system using a mixed-initiative approach was made accessible for the public, the well-known EVAR system from the Chair for Pattern Recognition of the University of Erlangen-Nuremberg. Then we discuss certain requirements we consider necessary for the successful application of dialog systems. Finally we present trends and developments in the area of telephone-based dialog systems.

Jürgen Haas, Florian Gallwitz, Axel Horndasch, Richard Huber, Volker Warnke
Robust Parallel Speech Recognition in Multiple Energy Bands

In this paper we will investigate the performance of TRAP-features on clean and noisy data. Multiple feature sets are evaluated on a corpus which was recorded in clean and noisy environment. In addition, the clean version was reverberated artificially. The feature sets are assembled from selected energy bands. In this manner multiple recognizers are trained using different energy bands. The outputs of all recognizers are joined with ROVER in order to achieve a single recognition result. This system is compared to a baseline recognizer that uses Mel frequency cepstrum coefficients (MFCC). In this paper we will point out that the use of artificial reverberation leads to more robustness to noise in general. Furthermore most TRAP-based features excel in phone recognition. While MFCC features prove to be better in a matched training/test situation, TRAP-features clearly outperform them in a mismatched training/test situation: When we train on clean data and evaluate on noisy data the word accuracy (WA) can be raised by 173 % relative (from 12.0 % to 32.8 % WA).

Andreas Maier, Christian Hacker, Stefan Steidl, Elmar Nöth, Heinrich Niemann
Pronunciation Feature Extraction

Automatic pronunciation scoring makes novel applications for computer assisted language learning possible. In this paper we concentrate on the feature extraction. A relatively large feature vector with 28 sentence- and 33 word-level features has been designed. On the word-level correctly and mispronounced words are classified, on the sentence-level utterances are rated with 5 discrete marks. The features are evaluated on two databases with non-native adults’ and children’s speech, respectively. Up to 72 % class-wise-averaged recognition rate is achieved for 2 classes; the result of the 5-class problem can be interpreted as 80 % recognition rate.

Christian Hacker, Tobias Cincarek, Rainer Gruhn, Stefan Steidl, Elmar Nöth, Heinrich Niemann
Multi-lingual and Multi-modal Speech Processing and Applications

Over the last decade voice technologies for telephony and embedded solutions became much more mature, resulting in applications providing mobile access to digital information from anywhere. Both a growing demand for voice driven applications in many languages and the need for improved usability and user experience now drives the exploration of multi-lingual speech processing techniques for recognition, synthesis and conversational dialog management. In this overview article we discuss our recent activities on multi-lingual voice technologies and describe the benefits of multi-lingual modeling for the creation of multi-modal mobile and telephony applications.

Jozef Ivanecky, Julia Fischer, Marion Mast, Siegfried Kunzmann, Thomas Ross, Volker Fischer

3D View Registration and Surface Modeling

Cluster-Based Point Cloud Analysis for Rapid Scene Interpretation

A histogram-based method for the interpretation of three-dimensional (3D) point clouds is introduced, where point clouds represent the surface of a scene of multiple objects and background. The proposed approach relies on a pose-invariant object representation that describes the distribution of surface point-pair relations as a model histogram. The models of the used objects are previously trained and stored in a database. The paper introduces an algorithm that divides a large number of randomly drawn surface points, into sets of potential candidates for each object model. Then clusters are established in every model-specific point set. Each cluster contains a local subset of points, which is evaluated in six refinement steps. In the refinement steps point-pairs are built and the distribution of their relationships is used to select and merge reliable clusters or to delete them in the case of uncertainty. In the end, the algorithm provides local subsets of surface points, labeled as an object. In the experimental section the approach shows the capability for scene interpretation in terms of high classification rates and fast processing times for both synthetic and real data.

Eric Wahl, Gerd Hirzinger
A Novel Parameter Decomposition Approach to Faithful Fitting of Quadric Surfaces

This paper addresses a common problem in dealing with range images. We propose a novel method to fit surfaces of known types via a parameter decomposition approach. This approach is faithful and it strongly reduces the possibilities of dropping into local minima in the process of iterative optimization. Therefore, it increases the tolerance in selecting the initializations. Moreover, it reduces the computation time and increases the fitting accuracy compared to former approaches. We present methods for fitting cylinders, cones, and tori to 3D data points. They share the fundamental idea of decomposing the set of parameters into two parts: one part has to be solved by some traditional optimization method and another part can be solved either analytically or directly. We experimentally compare our method with a fitting algorithm recently reported in the literature. The results demonstrate that our method has superior performance in accuracy and speed.

Xiaoyi Jiang, Da-Chuan Cheng
3D Surface Reconstruction by Combination of Photopolarimetry and Depth from Defocus

In this paper we present a novel image-based 3D surface reconstruction technique that incorporates reflectance, polarisation, and defocus information into a variational framework. Our technique is especially suited for the difficult task of 3D reconstruction of rough metallic surfaces. An error functional composed of several error terms related to the measured reflectance and polarisation properties is minimised by means of an iterative scheme in order to obtain a 3D reconstruction of the surface. This iterative algorithm is initialised with the result of depth from defocus analysis. By evaluating the algorithm on synthetic ground truth data, we show that the combined approach strongly improves the accuracy of the surface reconstruction result compared to techniques based on either reflectance or polarisation alone. Furthermore, we report 3D reconstruction results for a raw forged iron surface. A comparison of our method to independently measured ground truth data yields an accuracy of about one third of the pixel resolution.

Pablo d’Angelo, Christian Wöhler
Vision-Based 3D Object Localization Using Probabilistic Models of Appearance

The ability to accurately localize objects in an observed scene is regarded as an important precondition for many practical applications including automatic manufacturing, quality assurance, or human-robot interaction. A popular method to recognize three-dimensional objects in two-dimensional images is to apply so-called view-based approaches. In this paper, we present an approach that uses a probabilistic view-based object recognition technique for 3D localization of rigid objects. Our system generates a set of views for each object to learn an object model which is applied to identify the 6D pose of the object in the scene. In practical experiments carried out with real image data as well as rendered images, we demonstrate that our approach is robust against changing lighting conditions and high amounts of clutter.

Christian Plagemann, Thomas Müller, Wolfram Burgard
Projective Model for Central Catadioptric Cameras Using Clifford Algebra

A new method for describing the equivalence of catadioptric and stereographic projections is presented. This method produces a simple projection usable in all central catadioptric systems. A projective model for the sphere is constructed in such a way that it allows the effective use of Clifford algebra in the description of the geometrical entities on the spherical surface.

Antti Tolvanen, Christian Perwass, Gerald Sommer
A New Methodology for Determination and Correction of Lens Distortion in 3D Measuring Systems Using Fringe Projection

A new methodology for the determination and correction of lens distortion in fringe projection systems for 3D object measurement is introduced. The calibration of the distortion is performed in the device ready for measurement based on the simultaneous determination of both projector and (remaining) camera distortion. The application of the algorithm allows a reduction of distortion errors up to 0.02 pixels in the projector chip and also in the camera chip.

Christian Bräuer-Burchardt
A Method for Fast Search of Variable Regions on Dynamic 3D Point Clouds

The paper addresses the region search problem in three-dimensional (3D) space. The data used is a dynamically growing point cloud as it is typically gathered with a 3D-sensing device like a laser range-scanner. An encoding of space in combination with a new region search algorithm is introduced. The algorithm allows for fast access to spherical subsets of variable size. An octree based and a balanced binary tree based implementation are discussed. Finally, experiments concerning processing time are shown.

Eric Wahl, Gerd Hirzinger
6D-Vision: Fusion of Stereo and Motion for Robust Environment Perception

Obstacle avoidance is one of the most important challenges for mobile robots as well as future vision based driver assistance systems. This task requires a precise extraction of depth and the robust and fast detection of moving objects. In order to reach these goals, this paper considers vision as a process in space and time. It presents a powerful fusion of depth and motion information for image sequences taken from a moving observer. 3D-position and 3D-motion for a large number of image points are estimated simultaneously by means of Kalman-Filters. There is no need of prior error-prone segmentation. Thus, one gets a rich 6D representation that allows the detection of moving obstacles even in the presence of partial occlusion of foreground or background.

Uwe Franke, Clemens Rabe, Hernán Badino, Stefan Gehrig
A Method for Determining Geometrical Distortion of Off-The-Shelf Wide-Angle Cameras

In this work we present a method for calibrating and removing nonlinear geometric distortion of an imaging device. The topic is of importance since most reasoning in projective geometry requires the projection to be strictly line preserving. The model of radial-symmetric pincushion or barrel distortions, is generally not sufficient to compensate for all non-linearities of the projection, this is true especially for wide-angle cameras. Therefore we applied a more complex parametric model to compensate for the non-central distortion effects. The only a-priori knowledge that is used is the straightness of some edges in the recorded image. In our experiments we could show that the method is applicable especially for off-the-self cameras with medium quality optics.

Helmut Zollner, Robert Sablatnig

Motion and Tracking

A System for Marker-Less Human Motion Estimation

In this contribution we present a silhouette based human motion estimation system. The system components contain silhouette extraction based on level sets, a correspondence module, which relates image data to model data and a pose estimation module. Experiments are done in a four camera setup and we estimate the model components with 21 degrees of freedom in two frames per second. Finally, we perform a comparison of the motion estimation system with a marker based tracking system to perform a quantitative error analysis. The results show the applicability of the system for marker-less sports movement analysis.

B. Rosenhahn, U. G. Kersting, A. W. Smith, J. K. Gurney, T. Brox, R. Klette
A Fast Algorithm for Statistically Optimized Orientation Estimation

Filtering a signal with a finite impulse response (FIR) filter introduces dependencies between the errors in the filtered image due to overlapping filter masks. If the filtering only serves as a first step in a more complex estimation problem (e.g. orientation estimation), then these correlations can turn out to impair estimation quality.

The aim of this paper is twofold. First, we show that orientation estimation (with estimation of optical flow being an important special case for space-time volumes) is a Total Least Squares (TLS) problem:




with sought parameter vector


and given TLS data matrix


whose statistical properties can be described with a

covariance tensor

. In the second part, we will show how to improve TLS estimates given this statistical information.

Matthias Mühlich, Rudolf Mester
A Direct Method for Real-Time Tracking in 3-D Under Variable Illumination

3-D tracking of free-moving objects has to deal with brightness variations pronounced by the shape of the tracked surface. Pixel-based tracking techniques, though versatile, are particularly affected by such variations. Here, we evaluate two illumination-adaptive methods for a novel efficient pixel-based 3-D tracking approach. Brightness adaption by means of an illumination basis is compared to with a template update strategy with respect to both robustness and accuracy on tracking in 6 degrees-of-freedom.

Wolfgang Sepp
Efficient Combination of Histograms for Real-Time Tracking Using Mean-Shift and Trust-Region Optimization

Histogram based real-time object tracking methods, like the Mean-Shift tracker of Comaniciu/Meer or the Trust-Region tracker of Liu/Chen, have been presented recently. The main advantage is that a suited histogram allows for very fast and accurate tracking of a moving object even in the case of partial occlusions and for a moving camera. The problem is which histogram shall be used in which situation. In this paper we extend the framework of histogram based tracking. As a consequence we are able to formulate a tracker that uses a weighted combination of histograms of different features. We compare our approach with two already proposed histogram based trackers for different historgrams on large test sequences availabe to the public. The algorithms run in real-time on standard PC hardware.

F. Bajramovic, Ch. Gräßl, J. Denzler
Spiders as Robust Point Descriptors

This paper introduces a new operator to characterize a point in an image in a distinctive and invariant way. The robust recognition of points is a key technique in computer vision: algorithms for stereo correspondence, motion tracking and object recognition rely heavily on this type of operator. The goal in this paper is to describe the salient point to be characterized by a constellation of surrounding anchor points. Salient points are the most reliably localized points extracted by an interest point operator. The anchor points are multiple interest points in a visually homogenous segment surrounding the salient point. Because of its appearance, this constellation is called a spider. With a prototype of the spider operator, results in this paper demonstrate how a point can be recognized in spite of significant image noise, inhomogeneous change in illumination and altered perspective. For an example that requires a high performance close to object / background boundaries, the prototype yields better results than David Lowe’s SIFT operator.

Adam Stanski, Olaf Hellwich
A Comparative Evaluation of Template and Histogram Based 2D Tracking Algorithms

In this paper, we compare and evaluate five contemporary, data-driven, real-time 2D object tracking methods: the region tracker by Hager et al., the Hyperplane tracker, the CONDENSATION tracker, and the Mean Shift and Trust Region trackers. The first two are classical template based methods, while the latter three are from the more recently proposed class of histogram based trackers. All trackers are evaluated for the task of pure translation tracking, as well as tracking translation plus scaling. For the evaluation, we use a publically available, labeled data set consisiting of surveillance videos of humans in public spaces. This data set demonstrates occlusions, changes in object appearance, and scaling.

B. Deutsch, Ch. Gräßl, F. Bajramovic, J. Denzler
Bayesian Method for Motion Segmentation and Tracking in Compressed Videos

This contribution presents a statistical method for segmentation and tracking of moving regions from the compressed videos. This technique is particularly efficient to analyse and track motion segments from the compression-oriented motion fields by using the Bayesian estimation framework. For each motion field, the algorithm initialises a partition that is subject to comparisons and associations with its tracking counterpart. Due to potential hypothesis incompatibility, the algorithm applies a conflict resolution technique to ensure that the partition inherits relevant characteristics from both hypotheses as far as possible. Each tracked region is further classified as a background or a foreground object based on an approximation of the logical mass, momentum, and impulse. The experiment has demonstrated promising results based on standard test sequences.

Siripong Treetasanatavorn, Uwe Rauschenbach, Jörg Heuer, André Kaup
Nonlinear Body Pose Estimation from Depth Images

This paper focuses on real-time markerless motion capture. The body pose of a person is estimated from depth images using an Iterative Closest Point algorithm. We present a very efficient approach, that estimates up to 28 degrees of freedom from 1000 data points with 4Hz. This is achieved by nonlinear optimization techniques using an analytically derived Jacobian and highly optimized correspondence search.

Daniel Grest, Jan Woetzel, Reinhard Koch

Computational Learning

Conservative Visual Learning for Object Detection with Minimal Hand Labeling Effort

We present a novel framework for unsupervised training of an object detection system. The basic idea is to (1) exploit a huge amount of unlabeled video data by being very conservative in selecting training examples; and (2) to start with a very simple object detection system and using generative and discriminative classifiers in an iterative co-training fashion to arrive at increasingly better object detectors. We demonstrate the framework on a surveillance task where we learn a person detector. We start with a simple moving object classifier and proceed with robust PCA (on shape and appearance) as a generative classifier which in turn generates a training set for a discriminative AdaBoost classifier. The results obtained by AdaBoost are again filtered by PCA which produces an even better training set. We demonstrate that by using this approach we avoid hand labeling training data and still achieve a state of the art detection rate.

Peter Roth, Helmut Grabner, Danijel Skočaj, Horst Bischof, Aleš Leonardis
Rapid Online Learning of Objects in a Biologically Motivated Recognition Architecture

We present an approach for the supervised online learning of object representations based on a biologically motivated architecture of visual processing. We use the output of a recently developed topographical feature hierarchy to provide a view-based representation of three-dimensional objects using a dynamical vector quantization approach. For a simple short-term object memory model we demonstrate real-time online learning of 50 complex-shaped objects within three hours. Additionally we propose some modifications of learning vector quantization algorithms that are especially adapted to the task of online learning and capable of effectively reducing the representational effort in a transfer from short-term to long-term memory.

Stephan Kirstein, Heiko Wersing, Edgar Körner
Semidefinite Clustering for Image Segmentation with A-priori Knowledge

Graph-based clustering methods are successfully applied to computer vision and machine learning problems. In this paper we demonstrate how to introduce a-priori knowledge on class membership in a systematic and principled way: starting from a convex relaxation of the graph-based clustering problem we integrate information about class membership by adding linear constraints to the resulting semidefinite program. With our method, there is no need to modify the original optimization criterion, ensuring that the algorithm will always converge to a high quality clustering or image segmentation.

Matthias Heiler, Jens Keuchel, Christoph Schnörr
Separable Linear Discriminant Classification

Linear discriminant analysis is a popular technique in computer vision, machine learning and data mining. It has been successfully applied to various problems, and there are numerous variations of the original approach. This paper introduces the idea of


LDA. Towards the problem of binary classification for visual object recognition, we derive an algorithm for training separable discriminant classifiers. Our approach provides rapid training and runtime behavior and also tackles the small sample size problem. Experimental results show that the method performs robust and allows for online learning.

Christian Bauckhage, John K. Tsotsos
Improving a Discriminative Approach to Object Recognition Using Image Patches

In this paper we extend a method that uses image patch histograms and discriminative training to recognize objects in cluttered scenes. The method generalizes and performs well for different tasks, e.g. for radiograph recognition and recognition of objects in cluttered scenes. Here, we further investigate this approach and propose several extensions. Most importantly, the method is substantially improved by adding multi-scale features so that it better accounts for objects of different sizes. Other extensions tested include the use of Sobel features, the generalization of histograms, a method to account for varying image brightness in the PCA domain, and SVMs for classification. The results are improved significantly, i.e. on average we have a 59% relative reduction of the error rate and we are able to obtain a new best error rate of 1.1% on the Caltech motorbikes task.

Thomas Deselaers, Daniel Keysers, Hermann Ney
Comparison of Multiclass SVM Decomposition Schemes for Visual Object Recognition

We consider the problem of multiclass decomposition schemes for Support Vector Machines with Linear, Polynomial and RBF kernels. Our aim is to compare and discuss popular multiclass decomposing approaches such as the One versus the Rest, One versus One, Decision Directed Acyclic Graphs, Tree Structured, Error Correcting Output Codes. We conducted our experiments on benchmark datastes consisting of camera images of 3D objects. In our experiments we found that all the multiclass decomposing schemes for SVMs performed comparably very well with no significant statistical differences in cases of nonlinear kernels. In case of linear kernels the multiclass schemes OvR, OvO and DDAG outperform Tree Structured and ECOC.

Laine Kahsay, Friedhelm Schwenker, Günther Palm
Shape Priors and Online Appearance Learning for Variational Segmentation and Object Recognition in Static Scenes

We present an integrated two-level approach to computationally analyzing image sequences of static scenes by variational segmentation. At the top level, estimated models of object appearance and background are probabilistically fused to obtain an a-posteriori probability for the occupancy of each pixel. The data-association strategy handles object occlusions explicitly.

At the lower level, object models are inferred by variational segmentation based on image data and statistical shape priors. The use of shape priors allows to distinguish between recognition of known objects and segmentation of unknown objects. The object models are sufficiently flexible to enable the integration of general cues like advanced shape distances. At the same time, they are highly constrained from the optimization viewpoint: the globally optimal parameters can be computed at each time instant by dynamic programming.

The novelty of our approach is the integration of state-of-the-art variational segmentation into a probabilistic framework for static scene analysis that combines both on-line learning and prior knowledge of various aspects of object appearance.

Martin Bergtholdt, Christoph Schnörr
Over-Complete Wavelet Approximation of a Support Vector Machine for Efficient Classification

In this paper, we present a novel algorithm for reducing the runtime computational complexity of a Support Vector Machine classifier. This is achieved by approximating the Support Vector Machine decision function by an over-complete Haar wavelet transformation. This provides a set of classifiers of increasing complexity that can be used in a cascaded fashion yielding excellent runtime performance. This over-complete transformation finds the optimal approximation of the Support Vectors by a set of rectangles with constant gray-level values (enabling an Integral Image based evaluation). A major feature of our training algorithm is that it is fast, simple and does not require complicated tuning by an expert in contrast to the Viola & Jones classifier. The paradigm of our method is that, instead of trying to estimate a classifier that is jointly accurate and fast (such as the Viola & Jones detector), we first build a classifier that is proven to have optimal generalization capabilities; the focus then becomes runtime efficiency while maintaining the classifier’s optimal accuracy. We apply our algorithm to the problem of face detection in images but it can also be used for other image based classifications. We show that our algorithm provides, for a comparable accuracy, a 15 fold speed-up over the Reduced Support Vector Machine and a 530 fold speed-up over the Support Vector Machine, enabling face detection at 25 fps on a standard PC.

Matthias Rätsch, Sami Romdhani, Gerd Teschke, Thomas Vetter
Regularization on Discrete Spaces

We consider the classification problem on a finite set of objects. Some of them are labeled, and the task is to predict the labels of the remaining unlabeled ones. Such an estimation problem is generally referred to as transductive inference. It is well-known that many meaningful inductive or supervised methods can be derived from a regularization framework, which minimizes a loss function plus a regularization term. In the same spirit, we propose a general discrete regularization framework defined on finite object sets, which can be thought of as discrete analogue of classical regularization theory. A family of transductive inference schemes is then systemically derived from the framework, including our earlier algorithm for transductive inference, with which we obtained encouraging results on many practical classification problems. The discrete regularization framework is built on discrete analysis and geometry developed by ourselves, in which a number of discrete differential operators of various orders are constructed, which can be thought of as discrete analogues of their counterparts in the continuous case.

Dengyong Zhou, Bernhard Schölkopf
Recognition of 3D Objects by Learning from Correspondences in a Sequence of Unlabeled Training Images

This paper proposes an approach for the unsupervised learning of object models from local image feature correspondences. The object models are learned from an unlabeled sequence of training images showing one object after the other. The obtained object models enable the recognition of these objects in cluttered scenes, under occlusion, in-plane rotation and scale change. Maximally stable extremal regions are used as local image features and two different types of descriptors characterising the appearance and shape of the regions allow a robust matching. Experiments with real objects show the recognition performance of the presented approach under various conditions.

Raimund Leitner, Horst Bischof
Self-learning Segmentation and Classification of Cell-Nuclei in 3D Volumetric Data Using Voxel-Wise Gray Scale Invariants

We introduce and discuss a new method for segmentation and classification of cells from 3D tissue probes. The anisotropic 3D volumetric data of fluorescent marked cell nuclei is recorded by a confocal laser scanning microscope (LSM). Voxel-wise gray scale features (see accompaning paper [1][2]) ), invariant towards 3D rotation of its neighborhood, are extracted from the original data by integrating over the 3D rotation group with non-linear kernels.

In an interactive process, support-vector machine models are trained for each cell type using user relevance feedback. With this reference database at hand, segmentation and classification can be achieved in one step, simply by classifying each voxel and performing a connected component labelling, automatically without further human interaction. This general approach easily allows adoption of other cell types or tissue structures just by adding new training samples and re-training the model. Experiments with datasets from chicken chorioallantoic membrane show encouraging results.

Janis Fehr, Olaf Ronneberger, Haymo Kurz, Hans Burkhardt


License Plate Character Segmentation Using Hidden Markov Chains

We propose a method for segmentation of a line of characters in a noisy low resolution image of a car license plate. The Hidden Markov Chains are used to model a stochastic relation between an input image and a corresponding character segmentation. The segmentation problem is expressed as the maximum a posteriori estimation from a set of admissible segmentations. The proposed method exploits a specific prior knowledge available for the application at hand. Namely, the number of characters is known and its is also known that the characters can be segmented to sectors with equal but unknown width. The efficient algorithm for estimation based on dynamic programming is derived. The proposed method was successfully tested on data from a real life license plate recognition system.

Vojtěch Franc, Václav Hlaváč
Digital Subtraction CT Lung Perfusion Image Based on 3D Affine Registration

We propose a fast and robust registration technique for accurately imaging lung perfusion and efficiently detecting pulmonary embolism in chest CT angiography. For the registration of a pair of CT scans, a proper geometrical transformation is found through the following steps: First, the initial registration using an optimal cube is performed for correcting the gross translational mismatch. Second, the initial alignment is refined by iterative surface registration. For fast and robust convergence of the distance measure to the optimal value, a 3D distance map is generated by the narrow-band distance propagation. Third, enhanced vessels are visualized by subtracting registered pre-contrast images from post-contrast images. To facilitate visualization of parenchymal enhancement, color-coded mapping and image fusion is used. Our method has been successfully applied to ten patients of pre- and post-contrast images in chest CT angiography. Experimental results show that the performance of our method is very promising compared with conventional method with the aspects of its visual inspection, accuracy and processing time.

Helen Hong, Jeongjin Lee
Combination of Tangent Distance and an Image Distortion Model for Appearance-Based Sign Language Recognition

In this paper, we employ a zero-order local deformation model to model the visual variability of video streams of American sign language (ASL) words. We discuss two possible ways of combining the model with the tangent distance used to compensate for affine global transformations. The integration of the deformation model into our recognition system improves the error rate on a database of ASL words from 22.2% to 17.2%.

Morteza Zahedi, Daniel Keysers, Thomas Deselaers, Hermann Ney
Volumetric Analysis of a Sinter Process in Time

We present the first fully three dimensional analysis of the sinter process of copper using particles of a realistic size. This has been made possible through the use of


CT. A 3D image processing chain, applied to each time step of this 4D dataset, followed by image registration and particle matching steps was used. This allows for the tracking of individual particle motions during the sintering process, which gives a large amount of information towards understanding this process.

Oliver Wirjadi, Andreas Jablonski, Katja Schladitz, Michael Nöthe
Network Snakes-Supported Extraction of Field Boundaries from Imagery

A fully automatic method to extract field boundaries from imagery is described in this paper. The fields are represented together with additional prior knowledge in the form of GIS-data in a semantic model. The approach consists of two main steps: Firstly, a segmentation is carried out in a coarse scale resulting in preliminary field boundaries. In a second step network snakes are used to improve the geometrical correctness of the preliminary boundaries taking into account topological constraints while exploiting the local image information. Focussing on the network snakes and their specialties the results demonstrate the potential of the proposed solution.

Matthias Butenuth, Christian Heipke
Structure Features for Content-Based Image Retrieval

The geometric structure of an image exhibits fundamental information. Various structure-based feature extraction methods have been developed and successfully applied to image processing problems. In this paper we introduce a geometric structure-based feature generation method, called line-structure recognition (LSR) and apply it to content-based image retrieval. The algorithm is adapted from line segment coherences, which incorporate inter-relational structure knowledge encoded by hierarchical agglomerative clustering, resulting in illumination, scale and rotation robust features. We have conducted comprehensive tests and analyzed the results in detail. The results have been obtained from a subset of 6000 images taken from the Corel image database. Moreover, we compared the performance of LSR with Gabor wavelet features.

Gerd Brunner, Hans Burkhardt
Blind Background Subtraction in Dental Panoramic X-Ray Images: An Application Approach

Dental Panoramic X-ray images are images having complex content, because several layers of tissue, bone, fat, etc. are superimposed. Non-uniform illumination, stemming from the X-ray source, gives extra modulation to the image, which causes spatially varying X-ray photon density. The interaction of the X-ray photons with the density of matter causes spatially coherent varying noise contribution. Many algorithms exist to compensate background effects, by pixel based or global methods. However, if the image is contaminated by a non-negligible amount of noise, that is usually non-Gaussian, the methods cannot approximate the background efficiently. In this paper, a dedicated approach for background subtraction is presented, which operates blind, that means the separation of a set of independent signals from a set of mixed signals, with at least, only little a priori information about the nature of the signals, using the A-Trous multiresolution transform to alleviate this problem. The new method estimates the background bias from a reference scan, which is taken without a patient. The background values are rescaled by a polynomial compensation factor, given by mean square error criteria, thus subtracting the background will not produce additional artifacts in the image. The energy of the background estimate is subtracted from the energy of the mixture. The method is capable to remove spatially varying noise also, allocating an appropriate spatially noise estimate. This approach has been tested on 50 images from a database of panoramic X-ray images, where the results are cross validated by medical experts.

Peter Michael Goebel, Nabil Ahmed Belbachir, Michael Truppe
Robust Head Detection and Tracking in Cluttered Workshop Environments Using GMM

A vision based head tracking approach is presented, combining foreground information with an elliptical head model based on the integration of gradient and skin-color information. The system has been developed to detect and robustly track a human head in cluttered workshop environments with changing illumination conditions. A foreground map based on

Gaussian Mixture Models

(GMM) is used to segment a person from the background and to eliminate unwanted background cues. To overcome known problems of adaptive background models, a

high-level feedback

module prevents regions of interest to become background over time. To obtain robust and reliable detection and tracking results, several extensions of the GMM update mechanism have been developed.

Alexander Barth, Rainer Herpers

Uncertainity and Robustness

Stability and Local Feature Enhancement of Higher Order Nonlinear Diffusion Filtering

This paper discusses the extension of nonlinear diffusion filters to higher derivative orders. While such processes can be useful in practice, their theoretical properties are only partly understood so far. We establish important results concerning



-stability and forward-backward diffusion properties which are related to well-posedness questions. Stability in the



-norm is proven for nonlinear diffusion filtering of arbitrary order. In the case of fourth order filtering, a qualitative description of the filtering behaviour in terms of forward and backward diffusion is given and compared to second order nonlinear diffusion. This description shows that

curvature enhancement

is possible with of fourth order nonlinear diffusion in contrast to second order filters where only edges can be enhanced.

Stephan Didas, Joachim Weickert, Bernhard Burgeth
Estimation of Geometric Entities and Operators from Uncertain Data

In this text we show how points, point pairs, lines, planes, circles, spheres, and rotation, translation and dilation operators and their uncertainty can be evaluated from uncertain data in a unified manner using the Geometric Algebra of conformal space. This extends previous work by Förstner et al. [3] from points, lines and planes to non-linear entities and operators, while keeping the linearity of the estimation method. We give a theoretical description of our approach and show the results of some synthetic experiments.

Christian Perwass, Christian Gebken, Gerald Sommer
Wiener Channel Smoothing: Robust Wiener Filtering of Images

In this paper, we combine the well-established technique of Wiener filtering with an efficient method for robust smoothing: channel smoothing. The main parameters to choose in channel smoothing are the number of channels and the averaging filter. Whereas the number of channels has a natural lower bound given by the noise level and should for the sake of speed be as small as possible, the averaging filter is a less obvious choice. Based on the linear behavior of channel smoothing for inlier noise, we derive a Wiener filter applicable for averaging the channels of an image. We show in some experiments that our method compares favorable with established methods.

Michael Felsberg
Signal and Noise Adapted Filters for Differential Motion Estimation

Differential motion estimation in image sequences is based on measuring the orientation of local structures in spatio-temporal signal volumes. For this purpose, discrete filters which yield estimates of the local gradient are applied to the image sequence. Whereas previous approaches to filter optimization concentrate on the reduction of the systematical error of filters and motion models, the method presented in this paper is based on the statistical characteristics of the data. We present a method for adapting linear shift invariant filters to image sequences or whole classes of image sequences. We show how to simultaneously optimize derivative filters according to the systematical errors as well as to the statistical ones.

Kai Krajsek, Rudolf Mester
Variational Deblurring of Images with Uncertain and Spatially Variant Blurs

We consider the problem of deblurring images which have been blurred by different reasons during image acquisition. We propose a variational approach admitting spatially variant and irregularly shaped point-spread functions. By involving robust data terms, it achieves a high robustness particularly with respect to imprecisions in the estimation of the point-spread function. A good restoration of image features is ensured by using non-convex regularisers and a strategy of reducing the regularisation weight. Experiments with irregular spatially invariant as well as with spatially variant point-spread functions demonstrate the good quality of the method as well as its stability under noise.

Martin Welk, David Theis, Joachim Weickert
Energy Tensors: Quadratic, Phase Invariant Image Operators

In this paper we briefly review a not so well known quadratic, phase invariant image processing operator, the energy operator, and describe its tensor-valued generalization, the energy tensor. We present relations to the real-valued and the complex valued energy operators and discuss properties of the three operators. We then focus on the discrete implementation for estimating the tensor based on Teager’s algorithm and frame theory. The kernels of the real-valued and the tensor-valued operators are formally derived. In a simple experiment we compare the energy tensor to other operators for orientation estimation. The paper is concluded with a short outlook to future work.

Michael Felsberg, Erik Jonsson

Invited Paper

Object Categorization and the Need for Many-to-Many Matching

Object recognition systems have their roots in the AI community, and originally addressed the problem of object categorization. These early systems, however, were limited by their inability to bridge the representational gap between low-level image features and high-level object models, hindered by the assumption of one-to-one correspondence between image and model features. Over the next thirty years, the mainstream recognition community moved steadily in the direction of exemplar recognition while narrowing the representational gap. The community is now returning to the categorization problem, and faces the same representational gap as its predecessors did. We review the evolution of object recognition systems and argue that bridging this representational gap requires an ability to match image and model features many-to-many. We review three formulations of the many-to-many matching problem as applied to model acquisition and object recognition.

Sven Dickinson, Ali Shokoufandeh, Yakov Keselman, Fatih Demirci, Diego Macrini
Pattern Recognition
Walter G. Kropatsch
Robert Sablatnig
Allan Hanbury
Copyright Year
Springer Berlin Heidelberg
Electronic ISBN
Print ISBN

Premium Partner