Skip to main content
main-content

Über dieses Buch

This book constitutes the refereed proceedings of the 36th German Conference on Pattern Recognition, GCPR 2014, held in Münster, Germany, in September 2014. The 58 revised full papers and 8 short papers were carefully reviewed and selected from 153 submissions. The papers are organized in topical sections on variational models for depth and flow, reconstruction, bio-informatics, deep learning and segmentation, feature computation, video interpretation, segmentation and labeling, image processing and analysis, human pose and people tracking, interpolation and inpainting.

Inhaltsverzeichnis

Frontmatter

Variational Models for Depth and Flow

Frontmatter

Scene Flow Estimation from Light Fields via the Preconditioned Primal-Dual Algorithm

In this paper we present a novel variational model to jointly estimate geometry and motion from a sequence of light fields captured with a plenoptic camera. The proposed model uses the so-called sub-aperture representation of the light field. Sub-aperture images represent images with slightly different viewpoints, which can be extracted from the light field. The sub-aperture representation allows us to formulate a convex global energy functional, which enforces multi-view geometry consistency, and piecewise smoothness assumptions on the scene flow variables. We optimize the proposed scene flow model by using an efficient preconditioned primal-dual algorithm. Finally, we also present synthetic and real world experiments.

Stefan Heber, Thomas Pock

Introducing More Physics into Variational Depth–from–Defocus

Given an image stack that captures a static scene with different focus settings, variational depth–from–defocus methods aim at jointly estimating the underlying depth map and the sharp image. We show how one can improve existing approaches by incorporating important physical properties. Most formulations are based on an image formation model (forward operator) that explains the varying amount of blur depending on the depth. We present a novel forward operator: It approximates the thin–lens camera model from physics better than previous ones used for this task, since it preserves the maximum–minimum principle w.r.t. the unknown image intensities. This operator is embedded in a variational model that is minimised with a

multiplicative

variant of the Euler–Lagrange formalism. This offers two advantages: Firstly, it guarantees that the solution remains in the physically plausible positive range. Secondly, it allows a stable gradient descent evolution without the need to adapt the relaxation parameter. Experiments with synthetic and real–world images demonstrate that our model is highly robust under different initialisations. Last but not least, the experiments show that the physical constraints are essential for obtaining more accurate solutions, especially in the presence of strong depth changes.

Nico Persch, Christopher Schroers, Simon Setzer, Joachim Weickert

Reconstruction

Frontmatter

High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth

We present a structured lighting system for creating high-resolution stereo datasets of static indoor scenes with highly accurate ground-truth disparities. The system includes novel techniques for efficient 2D subpixel correspondence search and self-calibration of cameras and projectors with modeling of lens distortion. Combining disparity estimates from multiple projector positions we are able to achieve a disparity accuracy of 0.2 pixels on most observed surfaces, including in half-occluded regions. We contribute 33 new 6-megapixel datasets obtained with our system and demonstrate that they present new challenges for the next generation of stereo algorithms.

Daniel Scharstein, Heiko Hirschmüller, York Kitajima, Greg Krathwohl, Nera Nešić, Xi Wang, Porter Westling

Semi-Global Matching: A Principled Derivation in Terms of Message Passing

Semi-global matching, originally introduced in the context of dense stereo, is a very successful heuristic to minimize the energy of a pairwise multi-label Markov Random Field defined on a grid. We offer the first principled explanation of this empirically successful algorithm, and clarify its exact relation to belief propagation and tree-reweighted message passing. One outcome of this new connection is an uncertainty measure for the MAP label of a variable in a Markov Random Field.

Amnon Drory, Carsten Haubold, Shai Avidan, Fred A. Hamprecht

Submap-Based Bundle Adjustment for 3D Reconstruction from RGB-D Data

The key contribution of this paper is a novel submapping technique for RGB-D-based bundle adjustment. Our approach significantly speeds up 3D object reconstruction with respect to full bundle adjustment while generating visually compelling 3D models of high metric accuracy. While submapping has been explored previously for mono and stereo cameras, we are the first to transfer and adapt this concept to RGB-D sensors and to provide a detailed analysis of the resulting gain. In our approach, we partition the input data uniformly into submaps to optimize them individually by minimizing the 3D alignment error. Subsequently, we fix the interior variables and optimize only over the separator variables between the submaps. As we demonstrate in this paper, our method reduces the runtime of full bundle adjustment by 32 % on average while still being able to deal with real-world noise of cheap commodity sensors. We evaluated our method on a large number of benchmark datasets, and found that we outperform several state-of-the-art approaches both in terms of speed and accuracy. Furthermore, we present highly accurate 3D reconstructions of various objects to demonstrate the validity of our approach.

Robert Maier, Jürgen Sturm, Daniel Cremers

Bio-informatics

Frontmatter

A Hierarchical Bayesian Approach for Unsupervised Cell Phenotype Clustering

We propose a hierarchical Bayesian model - the

wordless Hierarchical Dirichlet Processes-Hidden Markov Model

(wHDP-HMM), to tackle the problem of unsupervised cell phenotype clustering during the mitosis stages. Our model combines the unsupervised clustering capabilities of the HDP model with the temporal modeling aspect of the HMM. Furthermore, to model cell phenotypes effectively, our model uses a variant of the HDP, giving preference to morphology over co-occurrence. This is then used to model individual cell phenotype time series and cluster them according to the stage of mitosis they are in. We evaluate our method using two publicly available time-lapse microscopy video data-sets and demonstrate that the performance of our approach is generally better than the state-of-the-art.

Mahesh Venkata Krishna, Joachim Denzler

Information Bottleneck for Pathway-Centric Gene Expression Analysis

While DNA microarrays enable us to conveniently measure expression profiles in the scope of thousands of genes, the subsequent association studies typically suffer from a tremendous imbalance between number of variables (genes) and observations (subjects). Even more so, each gene is heavily perturbed by noise which prevents any meaningful analysis on the single-gene level [

6

]. Hence, the focus shifted to pathways as groups of functionally related genes [

4

], in the hope that aggregation potentiates the underlying signal. Technically, this leads to a problem of feature extraction which was previously tackled by principal component analysis [

5

]. We reformulate the task using an extension of the Meta-Gaussian Information Bottleneck method as a means to compress a gene set while preserving information about a

relevance

variable. This opens up new possibilities, enabling us to make use of clinical side information in order to uncover hidden characteristics in the data.

David Adametz, Mélanie Rey, Volker Roth

Deep Learning and Segmentation

Frontmatter

Convolutional Decision Trees for Feature Learning and Segmentation

Most computer vision and especially segmentation tasks require to extract features that represent local appearance of patches. Relevant features can be further processed by learning algorithms to infer posterior probabilities that pixels belong to an object of interest. Deep Convolutional Neural Networks (CNN) define a particularly successful class of learning algorithms for semantic segmentation, although they proved to be very slow to train even when employing special purpose hardware. We propose, for the first time, a general purpose segmentation algorithm to extract the most informative and interpretable features as convolution kernels while simultaneously building a multivariate decision tree. The algorithm trains several orders of magnitude faster than regular CNNs and achieves state of the art results in processing quality on benchmark datasets.

Dmitry Laptev, Joachim M. Buhmann

A Deep Variational Model for Image Segmentation

In this paper we introduce a novel model that combines Deep Convolutional Neural Networks with a global inference model. Our model is derived from a convex variational relaxation of the minimum s-t cut problem on graphs, which is frequently used for the task of image segmentation. We treat the outputs of Convolutional Neural Networks as the unary and pairwise potentials of a graph and derive a smooth approximation to the minimum s-t cut problem. During training, this approximation facilitates the adaptation of the Convolutional Neural Network to the smoothing that is induced by the global model. The training algorithm can be understood as a modified backpropagation algorithm, that explicitly takes the global inference layer into account.

We illustrate our approach on the task of supervised figure-ground segmentation. In contrast to competing approaches we train directly on the raw pixels of the input images and do not rely on hand-crafted features. Despite its generality, simplicity and complete lack of hand-crafted features, our approach is able to yield competitive performance on the Graz02 and Weizmann Horses datasets.

René Ranftl, Thomas Pock

Feature Computation

Frontmatter

Robust PCA: Optimization of the Robust Reconstruction Error Over the Stiefel Manifold

It is well known that Principal Component Analysis (PCA) is strongly affected by outliers and a lot of effort has been put into robustification of PCA. In this paper we present a new algorithm for robust PCA minimizing the trimmed reconstruction error. By directly minimizing over the Stiefel manifold, we avoid deflation as often used by projection pursuit methods. In distinction to other methods for robust PCA, our method has no free parameter and is computationally very efficient. We illustrate the performance on various datasets including an application to background modeling and subtraction. Our method performs better or similar to current state-of-the-art methods while being faster.

Anastasia Podosinnikova, Simon Setzer, Matthias Hein

An $$\mathcal {O}(n \log n)$$ Cutting Plane Algorithm for Structured Output Ranking

In this work, we consider ranking as a training strategy for structured output prediction. Recent work has begun to explore structured output prediction in the ranking setting, but has mostly focused on the special case of bipartite preference graphs. The bipartite special case is computationally efficient as there exists a linear time cutting plane training strategy for hinge loss bounded regularized risk, but it is unclear how to feasibly extend the approach to complete preference graphs. We develop here a highly parallelizable

$$\mathcal {O}(n \log n)$$

algorithm for cutting plane training with complete preference graphs that is scalable to millions of samples on a single core. We explore theoretically and empirically the relationship between slack rescaling and margin rescaling variants of the hinge loss bound to structured losses, showing that the slack rescaling variant has better stability properties and empirical performance with no additional computational cost per cutting plane iteration. We further show generalization bounds based on uniform convergence. Finally, we demonstrate the effectiveness of the proposed family of approaches on the problem of object detection in computer vision.

Matthew B. Blaschko, Arpit Mittal, Esa Rahtu

Exemplar-Specific Patch Features for Fine-Grained Recognition

In this paper, we present a new approach for fine-grained recognition or subordinate categorization, tasks where an algorithm needs to reliably differentiate between visually similar categories,

e.g.

, different bird species. While previous approaches aim at learning a single generic representation and models with increasing complexity, we propose an orthogonal approach that learns patch representations specifically tailored to every single test exemplar. Since we query a constant number of images similar to a given test image, we obtain very compact features and avoid large-scale training with all classes and examples. Our learned mid-level features are built on shape and color detectors estimated from discovered patches reflecting small highly discriminative structures in the queried images. We evaluate our approach for fine-grained recognition on the CUB-2011 birds dataset and show that high recognition rates can be obtained by model combination.

Alexander Freytag, Erik Rodner, Trevor Darrell, Joachim Denzler

Video Interpretation

Frontmatter

Motion Segmentation with Weak Labeling Priors

Motions of organs or extremities are important features for clinical diagnosis. However, tracking and segmentation of complex, quickly changing motion patterns is challenging, certainly in the presence of occlusions. Neither state-of-the-art tracking nor motion segmentation approaches are able to deal with such cases. Thus far, motion capture systems or the like were needed which are complicated to handle and which impact on the movements. We propose a solution based on a single video camera, that is not only far less intrusive, but also a lot cheaper. The limitation of tracking and motion segmentation are overcome by a new approach to integrate prior knowledge in the form of weak labeling into motion segmentation. Using the example of Cerebral Palsy detection, we segment motion patterns of infants into the different body parts by analyzing body movements. Our experimental results show that our approach outperforms current motion segmentation and tracking approaches.

Hodjat Rahmati, Ralf Dragon, Ole Morten Aamo, Luc van Gool, Lars Adde

Object-Level Priors for Stixel Generation

This paper presents a stereo vision-based scene model for traffic scenarios. Our approach effectively couples bottom-up image segmentation with object-level knowledge in a sound probabilistic fashion. The relevant scene structure, i.e. obstacles and freespace, is encoded using individual Stixels as building blocks that are computed bottom-up from dense disparity images. We present a principled way to additionally integrate top-down prior information about object location and shape that arises from independent system modules, ranging from geometric cues up to highly confident object detections. This results in an efficient exploration of orthogonal image-based cues, such as disparity and gray-level intensity data, combined in a consistent scene representation. The overall segmentation problem is modeled as a Markov Random Field and solved efficiently through Dynamic Programming.

We demonstrate superior segmentation accuracy compared to state-of-the-art superpixel algorithms regarding obstacles and freespace in the scene, evaluated on a large dataset captured in real-world traffic.

Marius Cordts, Lukas Schneider, Markus Enzweiler, Uwe Franke, Stefan Roth

Coherent Multi-sentence Video Description with Variable Level of Detail

Humans can easily describe what they see in a coherent way and at varying level of detail. However, existing approaches for automatic video description focus on generating only single sentences and are not able to vary the descriptions’ level of detail. In this paper, we address both of these limitations: for a variable level of detail we produce coherent multi-sentence descriptions of complex videos. To understand the difference between detailed and short descriptions, we collect and analyze a video description corpus of three levels of detail. We follow a two-step approach where we first learn to predict a semantic representation (SR) from video and then generate natural language descriptions from it. For our multi-sentence descriptions we model across-sentence consistency at the level of the SR by enforcing a consistent topic. Human judges rate our descriptions as more readable, correct, and relevant than related work.

Anna Rohrbach, Marcus Rohrbach, Wei Qiu, Annemarie Friedrich, Manfred Pinkal, Bernt Schiele

Segmentation and Labeling

Frontmatter

Asymmetric Cuts: Joint Image Labeling and Partitioning

For image segmentation, recent advances in optimization make it possible to combine noisy region appearance terms with pairwise terms which can not only

discourage

, but also

encourage

label transitions, depending on boundary evidence. These models have the potential to overcome problems such as the shrinking bias. However, with the ability to encourage label transitions comes a different problem: strong boundary evidence can overrule weak region appearance terms to create new regions out of nowhere. While some label classes exhibit strong internal boundaries, such as the background class which is the pool of objects. Other label classes, meanwhile, should be modeled as a single region, even if some internal boundaries are visible.

We therefore propose in this work to treat label classes asymmetrically: for some classes, we allow a further partitioning into their constituent objects as supported by boundary evidence; for other classes, further partitioning is forbidden. In our experiments, we show where such a model can be useful for both 2D and 3D segmentation.

Thorben Kroeger, Jörg H. Kappes, Thorsten Beier, Ullrich Koethe, Fred A. Hamprecht

Mind the Gap: Modeling Local and Global Context in (Road) Networks

We propose a method to label roads in aerial images and extract a topologically correct road network. Three factors make road extraction difficult: (i) high intra-class variability due to clutter like cars, markings, shadows on the roads; (ii) low inter-class variability, because some non-road structures are made of similar materials; and (iii) most importantly, a complex structural prior: roads form a connected network of thin segments, with slowly changing width and curvature, often bordered by buildings, etc. We model this rich, but complicated contextual information at two levels. Locally, the context and layout of roads is learned implicitly, by including multi-scale appearance information from a large neighborhood in the per-pixel classifier. Globally, the network structure is enforced explicitly: we first detect promising stretches of road via shortest-path search on the per-pixel evidence, and then select pixels on an optimal subset of these paths by energy minimization in a CRF, where each putative path forms a higher-order clique. The model outperforms several baselines on two challenging data sets, both in terms of precision/recall and w.r.t. topological correctness.

Javier A. Montoya-Zegarra, Jan D. Wegner, Ľubor Ladický, Konrad Schindler

Image Processing and Analysis

Frontmatter

Guided Image Super-Resolution: A New Technique for Photogeometric Super-Resolution in Hybrid 3-D Range Imaging

In this paper, we augment multi-frame super-resolution with the concept of guided filtering for simultaneous upsampling of 3-D range data and complementary photometric information in hybrid range imaging. Our guided super-resolution algorithm is formulated as joint maximum a-posteriori estimation to reconstruct high-resolution range and photometric data. In order to exploit local correlations between both modalities, guided filtering is employed for regularization of the proposed joint energy function. For fast and robust image reconstruction, we employ iteratively re-weighted least square minimization embedded into a cyclic coordinate descent scheme. The proposed method was evaluated on synthetic datasets and real range data acquired with Microsoft’s Kinect. Our experimental evaluation demonstrates that our approach outperforms state-of-the-art range super-resolution algorithms while it also provides super-resolved photometric data.

Florin C. Ghesu, Thomas Köhler, Sven Haase, Joachim Hornegger

Image Descriptors Based on Curvature Histograms

Descriptors based on orientation histograms are widely used in computer vision. The spatial pooling involved in these representations provides important invariance properties, yet it is also responsible for the loss of important details. In this paper, we suggest a way to preserve the details described by the local curvature. We propose a descriptor that comprises the direction and magnitude of curvature and naturally expands classical orientation histograms like SIFT and HOG. We demonstrate the general benefit of the expansion exemplarily for image classification, object detection, and descriptor matching.

Philipp Fischer, Thomas Brox

Human Pose and People Tracking

Frontmatter

Test-Time Adaptation for 3D Human Pose Estimation

In this paper we consider the task of articulated 3D human pose estimation in challenging scenes with dynamic background and multiple people. Initial progress on this task has been achieved building on discriminatively trained part-based models that deliver a set of 2D body pose candidates that are then subsequently refined by reasoning in 3D [

1

,

4

,

5

]. The performance of such methods is limited by the performance of the underlying 2D pose estimation approaches. In this paper we explore a way to boost the performance of 2D pose estimation based on the output of the 3D pose reconstruction process, thus closing the loop in the pose estimation pipeline. We build our approach around a component that is able to identify true positive pose estimation hypotheses with high confidence. We then either retrain 2D pose estimation models using such highly confident hypotheses as additional training examples, or we use similarity to these hypotheses as a cue for 2D pose estimation. We consider a number of features that can be used for assessing the confidence of the pose estimation results. The strongest feature in our comparison corresponds to the ensemble agreement on the 3D pose output. We evaluate our approach on two publicly available datasets improving over state of the art in each case.

Sikandar Amin, Philipp Müller, Andreas Bulling, Mykhaylo Andriluka

Efficient Multiple People Tracking Using Minimum Cost Arborescences

We present a new global optimization approach for multiple people tracking based on a hierarchical tracklet framework. A new type of tracklets is introduced, which we call

tree tracklets

. They contain bifurcations to naturally deal with ambiguous tracking situations. Difficult decisions are postponed to a later iteration of the hierarchical framework, when more information is available. We cast the optimization problem as a minimum cost arborescence problem in an acyclic directed graph, where a tracking solution can be obtained in linear time. Experiments on six publicly available datasets show that the method performs well when compared to state-of-the art tracking algorithms.

Roberto Henschel, Laura Leal-Taixé, Bodo Rosenhahn

Capturing Hand Motion with an RGB-D Sensor, Fusing a Generative Model with Salient Points

Hand motion capture has been an active research topic, following the success of full-body pose tracking. Despite similarities, hand tracking proves to be more challenging, characterized by a higher dimensionality, severe occlusions and self-similarity between fingers. For this reason, most approaches rely on strong assumptions, like hands in isolation or expensive multi-camera systems, that limit practical use. In this work, we propose a framework for hand tracking that can capture the motion of two interacting hands using only a single, inexpensive RGB-D camera. Our approach combines a generative model with collision detection and discriminatively learned salient points. We quantitatively evaluate our approach on 14 new sequences with challenging interactions.

Dimitrios Tzionas, Abhilash Srikantha, Pablo Aponte, Juergen Gall

Interpolation and Inpainting

Frontmatter

Flow and Color Inpainting for Video Completion

We propose a framework for temporally consistent video completion. To this end we generalize the exemplar-based inpainting method of Criminisi

et al

. [

7

] to video inpainting. Specifically we address two important issues: Firstly, we propose a color and optical flow inpainting to ensure temporal consistency of inpainting even for complex motion of foreground and background. Secondly, rather than requiring the user to hand-label the inpainting region in every single image, we propose a flow-based propagation of user scribbles from the first to subsequent video frames which drastically reduces the user input. Experimental comparisons to state-of-the-art video completion methods demonstrate the benefits of the proposed approach.

Michael Strobel, Julia Diebold, Daniel Cremers

Spatial and Temporal Interpolation of Multi-view Image Sequences

We propose a simple and effective framework for multi-view image sequence interpolation in space and time. For spatial view point interpolation we present a robust feature-based matching algorithm that allows for wide-baseline camera configurations. To this end, we introduce two novel filtering approaches for outlier elimination and a robust approach for match extrapolations at the image boundaries. For small-baseline and temporal interpolations we rely on an established optical flow based approach. We perform a quantitative and qualitative evaluation of our framework and present applications and results. Our method has a low runtime and results can compete with state-of-the-art methods.

Tobias Gurdan, Martin R. Oswald, Daniel Gurdan, Daniel Cremers

Pose Normalization for Eye Gaze Estimation and Facial Attribute Description from Still Images

Our goal is to obtain an eye gaze estimation and a face description based on attributes (e.g. glasses, beard or thick lips) from still images. An attribute-based face description reflects human vocabulary and is therefore adequate as face description. Head pose and eye gaze play an important role in human interaction and are a key element to extract interaction information from still images. Pose variation is a major challenge when analyzing them. Most current approaches for facial image analysis are not explicitly pose-invariant. To obtain a pose-invariant representation, we have to account the three dimensional nature of a face. A 3D Morphable Model (3DMM) of faces is used to obtain a dense 3D reconstruction of the face in the image. This Analysis-by-Synthesis approach provides model parameters which contain an explicit face description and a dense model to image correspondence. However, the fit is restricted to the model space and cannot explain all variations. Our model only contains straight gaze directions and lacks high detail textural features. To overcome this limitations, we use the obtained correspondence in a discriminative approach. The dense correspondence is used to extract a pose-normalized version of the input image. The warped image contains all information from the original image and preserves gaze and detailed textural information. On the pose-normalized representation we train a regression function to obtain gaze estimation and attribute description. We provide results for pose-invariant gaze estimation on still images on the UUlm Head Pose and Gaze Database and attribute description on the Multi-PIE database. To the best of our knowledge, this is the first pose-invariant approach to estimate gaze from unconstrained still images.

Bernhard Egger, Sandro Schönborn, Andreas Forster, Thomas Vetter

Posters

Frontmatter

Probabilistic Progress Bars

Predicting the time at which the integral over a stochastic process reaches a target level is a value of interest in many applications. Often, such computations have to be made at low cost, in real time. As an intuitive example that captures many features of this problem class, we choose progress bars, a ubiquitous element of computer user interfaces. These predictors are usually based on simple point estimators, with no error modelling. This leads to fluctuating behaviour confusing to the user. It also does not provide a distribution prediction (risk values), which are crucial for many other application areas. We construct and empirically evaluate a fast, constant cost algorithm using a Gauss-Markov process model which provides more information to the user.

Martin Kiefel, Christian Schuler, Philipp Hennig

Wide Base Stereo with Fisheye Optics: A Robust Approach for 3D Reconstruction in Driving Assistance

We propose a new approach to achieve 3D environment reconstruction based on automotive surround view systems with fisheye cameras. In particular, we demonstrate that stereo vision techniques can be applied in overlapping areas of adjacent cameras, which are up to 90 degrees per camera pair in the current setup. Lateral limitations are mainly due to the present system configuration and can be extended. No time accumulation is required, therefore the update rate of the range information is given by the frame rate of the imager. We show by means of experimental results that our approach is capable of delivering 3D information from a pair of images under the described configuration.

Jose Esparza, Michael Helmle, Bernd Jähne

Detection and Segmentation of Clustered Objects by Using Iterative Classification, Segmentation, and Gaussian Mixture Models and Application to Wood Log Detection

There have recently been advances in the area of fully automatic detection of clustered objects in color images. State of the art methods combine detection with segmentation. In this paper we show that these methods can be significantly improved by introducing a new iterative classification, statistical modeling, and segmentation procedure. The proposed method used a detect-and-merge algorithm, which iteratively finds and validates new objects and subsequently updates the statistical model, while converging in very few iterations.

Our new method does not require any a priori information or user input and works fully automatically on desktop computers and mobile devices, such as smartphones and tablets. We evaluate three different kinds of classifiers, which are used to substantially reduce the number of false positive matches, from which current state of the art methods suffer. Experiments are performed on a challenging database depicting wood log piles, with objects of inhomogeneous sizes and shapes. In all cases our method outperforms the current state of the art algorithms with a detection rate above 99 % and a false positive rate of less than 0.4 %.

Christopher Herbon, Klaus Tönnies, Bernd Stock

Tracking-Based Visibility Estimation

Assessing atmospheric visibility conditions is a challenging and increasingly important task not only in the context of video-based driver assistance systems. As a commonly used quantity,

meteorological visibility

describes the visual range for observations through scattering and absorbing aerosols such as fog or smog.

We present a novel algorithm for estimating meteorological visibility based on object tracks in camera images. To achieve this, we introduce a likelihood objective function based on Koschmieder’s model for horizontal vision to derive the atmospheric

extinction coefficient

from the objects’ luminances and distances provided by the tracking. To make this algorithm applicable for real-time purposes, we propose an easy-to-implement and extremely fast minimization method which clearly outperforms classical methods such as Levenberg-Marquardt. Our approach is tested with promising results on real-world sequences recorded with a commercial driver assistance camera as well as on artificial images generated by Monte-Carlo simulations.

Stephan Lenor, Johannes Martini, Bernd Jähne, Ulrich Stopper, Stefan Weber, Florian Ohr

Predicting the Influence of Additional Training Data on Classification Performance for Imbalanced Data

It is desirable to predict the influence of additional training data on classification performance because the generation of samples is often costly. Current methods can only predict performance as measured by accuracy, which is not suitable if one class is much rarer than another. We propose an approach which is able to also predict other measures such as G-mean and F-measure, which are used in cases of imbalanced data. We show that our method leads to more correct decisions whether to generate more training samples or not using a highly imbalanced real-world dataset of scanning electron microscopy images of nanoparticles.

Stephen Kockentiedt, Klaus Tönnies, Erhardt Gierke

Signal/Background Classification of Time Series for Biological Virus Detection

This work proposes translation-invariant features based on a wavelet transform that are used to classify time series as containing either relevant signals or noisy background. Due to the translation-invariant property, signals appearing at arbitrary locations in time have similar representations in feature space. Classification is carried out by a condensed

$$k$$

-Nearest-Neighbors classifier trained on these features, i.e. the training set is reduced for faster classification. This reduction is conducted by a

$$k$$

-means clustering of the original training set and using the obtained cluster centers as a new training set. The coreset-technique BICO is employed to accelerate this initial clustering for big datasets. The resulting feature extraction and classification pipeline is applied successfully in the context of biological virus detection. Data from Plasmon Assisted Microscopy of Nano-size Objects (PAMONO) is classified, achieving accuracy

$$0.999$$

for the most important classification task.

Dominic Siedhoff, Hendrik Fichtenberger, Pascal Libuschewski, Frank Weichert, Christian Sohler, Heinrich Müller

Efficient Hierarchical Triplet Merging for Camera Pose Estimation

This paper deals with efficient means for camera pose estimation for difficult scenes. Particularly, we speed up the combination of image triplets to image sets by hierarchical merging and a reduction of the number of merged points. By

image sets

we denote a generalization of image sequences where images can be linked in multiple directions, i.e., they can form a graph. To obtain reliable results for triplets, we use large numbers of corresponding points. For a high-quality and yet efficient merging of the triplets we propose strategies for the reduction of the number of points. The strategies are evaluated based on statistical measures employing the full covariance information for the camera poses from bundle adjustment. We show that to obtain a statistically sound result, intuitively appealing deterministic reduction strategies are problematic and that a simple reduction strategy based on random deletion was evaluated best. We also discuss the benefits of the evaluation measures for finding conceptual and implementation weaknesses. The paper is illustrated with a number of experiments giving standard deviations for all values.

Helmut Mayer

Lens-Based Depth Estimation for Multi-focus Plenoptic Cameras

Multi-focus portable plenoptic camera devices provide a reasonable tradeoff between spatial and angular resolution while enlarging the depth of field of a standard camera. Many applications using the data captured by these camera devices require or benefit from correspondences established between the single microlens images. In this work we propose a lens-based depth estimation scheme based on a novel adaptive lens selection strategy. Coarse depth estimates serve as indicators for suitable target lenses. The selection criterion accounts for lens overlap and the amount of defocus blur between the reference and possible target lenses. The depth maps are regularized using a semi-global strategy. For insufficiently textured scenes, we further incorporate a semi-global coarse regularization with respect to the lens-grid. In contrast to algorithms operating on the complete lightfield, our algorithm has a low memory footprint. The resulting per-lens dense depth maps are well suited for volumetric surface reconstruction techniques. We show that our selection strategy achieves similar error rates as selection strategies with a fixed number of lenses, while being computationally less time consuming. Results are presented for synthetic as well as real-world datasets.

Oliver Fleischmann, Reinhard Koch

Efficient Metropolis-Hasting Image Analysis for the Location of Vascular Entity

In this paper we present a novel approach for probabilistically exploring and modeling vascular networks in 3D angiograms. For modeling the vascular morphology and topology a graph-like particle model is used. Each particle represents the intrinsic properties of a small fraction of a vessel including position, orientation and scale. Explicit connections between particles determine the network topology. In evaluation using simulated as well as real X-ray and time-of-flight MRI angiograms the proposed method was able to accurately model the vascular network.

Henrik Skibbe, Marco Reisert, Shin Ishii

Automatic Determination of Anatomical Correspondences for Multimodal Field of View Correction

In spite of a huge body of work in medical image registration, there seems to be very little effort in Field of View (FOV) correction or anatomical overlap estimation especially for multi-modal studies. This is a key step for most registration algorithms to work on image volumes of different coverages. In this work, we consider the FOV correction problem between Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) image volumes for the same patient. A novel algorithm composed of a cascade of (a) symmetry based gross rotation/translation correction (b) multi-modal feature descriptor and (c) matching scheme using dynamic programming is presented. The above combination deals with the challenges of multi-modal studies namely intensity differences, in-homogeneity, and gross patient movement. Validation and comparisons of the proposed algorithm is quantitatively shown on

$$\mathbf {73}$$

CT-MRI pairs and has yielded promising results.

Hima Patel, Karthik Gurumoorthy, Seshadri Thiruvenkadam

Encoding Spatial Arrangements of Visual Words for Rotation-Invariant Image Classification

Incorporating the spatial information of visual words enhances the performance of the well-known bag-of-visual words (BoVWs) model for problems like object category recognition. However, object images can undergo various in-plane rotations due to which the spatial information must be added to the BoVWs model in rotation-invariant manner. We present a novel approach to integrate the spatial information to BoVWs model in a rotation-invariant way by encoding the triangular relationship among the positions of identical visual words in the

$$2D$$

image space. Our proposed BoVWs model is based on densely sampled local features for which the dominant orientations are calculated. Thus we achieve rotation-invariance both globally and locally. We validate our proposed method for rotation-invariance on datasets of ancient coins and butterflies and achieve better performance than the conventional BoVWs model.

Hafeez Anwar, Sebastian Zambanini, Martin Kampel

Accurate Detection in Volumetric Images Using Elastic Registration Based Validation

In this paper, we propose a method for accurate detection and segmentation of cells in dense plant tissue of

Arabidopsis Thaliana

. We build upon a system that uses a top down approach to yield the cell segmentations: A discriminative detection is followed by an elastic alignment of a cell template. While this works well for cells with a distinct appearance, it fails once the detection step cannot produce reliable initializations for the alignment. We propose a validation method for the aligned cell templates and show that we can thereby increase the average precision substantially.

Dominic Mai, Jasmin Dürr, Klaus Palme, Olaf Ronneberger

A Human Factors Study of Graphical Passwords Using Biometrics

One mode of authentication used in modern computing systems is graphical passwords. Graphical passwords are becoming more popular because touch-sensitive and pen-sensitive technologies are becoming ubiquitous. In this paper, we construct the “BioSketch” database, which is a general database of sketch-based passwords (SkPWs) with pressure information used as a biometric property. The BioSketch database is created so that recognition approaches may be commensurable with the benchmark performances. Using this database, we are also able to study the human-computer interaction (HCI) process for SkPWs. In this paper, we compare a generalized SKS recognition algorithm with the Fréchet distance in terms of the intra/inter-class variations and performances. The results show that the SKS-based approach achieves as much as a 7 % and 17 % reduction in equal error rate (EER) for random and skilled forgeries respectively.

Benjamin S. Riggan, Wesley E. Snyder, Xiaogang Wang, Jing Feng

Pedestrian Orientation Estimation

This paper addresses the task of estimating the orientation of pedestrians from monocular images provided by an automotive camera. From an initial detection of a pedestrian, we analyze the area within their bounding box and give an estimation of the orientation. Using ground truth mocap data, we define the orientations as a direction and a rough human pose. A random forest classifier trained on this data using HOG features assigns each detected pedestrian to their orientation cluster. Evaluation of the method is performed on a new dataset and on a publicly available dataset showing improved results.

Joe Lallemand, Alexandra Ronge, Magdalena Szczot, Slobodan Ilic

Distance-Based Descriptors and Their Application in the Task of Object Detection

In this paper, we propose an efficient and interesting way how to encode the shape of the objects. A lot of state-of-the art descriptors (e.g. HOG, Haar, LBP) are based on the fact that the shape of the objects can be described by brightness differences inside the image. It means that the descriptors encode the gradient or intensity differences inside the image (i.e. edges). In the cases that the edges are very thin, the edge information can be difficult to obtain and the dimensionally of feature vector (without the method for reduction) is typically large and contains redundant information. These ills are motivation for the proposed method in that the edges need not be hit directly; the input brightness function is transformed using the appropriate image distance function. After this transformation, the values of distance function inside objects and backgrounds are different and the values can be used for description of object appearance. We demonstrate the properties of the method for the case of solving the problem of face detection using the classical sliding window technique.

Radovan Fusek, Eduard Sojka

Hough Forests Revisited: An Approach to Multiple Instance Tracking from Multiple Cameras

Tracking multiple objects in parallel is a difficult task, especially if instances are interacting and occluding each other. To alleviate the arising problems multiple camera views can be taken into account, which, however, increases the computational effort. Evoking the need for very efficient methods, often rather simple approaches such as background subtraction are applied, which tend to fail for more difficult scenarios. Thus, in this work, we introduce a powerful multi-instance tracking approach building on Hough Forests. By adequately refining the time consuming building blocks, we can drastically reduce their computational complexity without a significant loss in accuracy. In fact, we show that the test time can be reduced by one to two orders of magnitude, allowing to efficiently process the large amount of image data coming from multiple cameras. Furthermore, we adapt the pre-trained generic forest model in an online manner to train an instance-specific model, making it well suited for multi-instance tracking. Our experimental evaluations show the effectiveness of the proposed efficient Hough Forests for object detection as well as for the actual task of multi-camera tracking.

Georg Poier, Samuel Schulter, Sabine Sternig, Peter M. Roth, Horst Bischof

Graph-Based and Variational Minimization of Statistical Cost Functionals for 3D Segmentation of Aortic Dissections

The objective of this contribution consists in segmenting dissected aortas in computed tomography angiography (CTA) data in order to obtain morphological specifics of each patient’s vessel. Custom-designed stent-grafts represent the only possibility to enable minimally invasive endovascular techniques concerning Type A dissections, which emerge within the ascending aorta (AA). The localization of cross-sectional aortic boundaries within planes orthogonal to a rough aortic centerline relies on a multicriterial 3D graph-based method. In order to consider the often non-circular shape of the dissected aortic cross-sections, the initial circular contour detected in the localization step undergoes a deformation process in 2D, steered by either local or global statistical distribution metrics. The automatic segmentation provided by our novel approach, which widely applies for the delineation of tubular structures of variable shapes and heterogeneous intensities, is compared with ground truth provided by a vascular surgeon for 11 CTA datasets.

Cosmin Adrian Morariu, Tobias Terheiden, Daniel Sebastian Dohle, Konstantinos Tsagakis, Josef Pauli

Mask-Specific Inpainting with Deep Neural Networks

Most

inpainting

approaches require a good image model to infer the unknown pixels. In this work, we directly learn a mapping from image patches, corrupted by missing pixels, onto complete image patches. This mapping is represented as a deep neural network that is automatically trained on a large image data set. In particular, we are interested in the question whether it is helpful to exploit the shape information of the missing regions, i.e. the masks, which is something commonly ignored by other approaches. In comprehensive experiments on various images, we demonstrate that our learning-based approach is able to use this extra information and can achieve state-of-the-art inpainting results. Furthermore, we show that training with such extra information is useful for

blind

inpainting, where the exact shape of the missing region might be uncertain, for instance due to aliasing effects.

Rolf Köhler, Christian Schuler, Bernhard Schölkopf, Stefan Harmeling

Detection of Clustered Objects in Sparse Point Clouds Through 2D Classification and Quadric Filtering

A novel approach for detecting single objects in large clusters is presented. The proposed method is designed to work with structure from motion data, which typically includes a set of input images, a very sparse point cloud and camera poses. We use provided objects of interest from 2D classification, which are then projected to three dimensional space.

The main contribution of this paper is an algorithm, which accurately detects the objects of interest and approximates their locations in three dimensional space, by using 2D classification data and quadric filtering. Optionally, a partly dense reconstructed mesh, containing objects of interest only, is computed, without the need for applying patch based multiple view stereo algorithms first. Experiments are performed on a challenging database containing images of wood log piles with a known ground truth number of objects, provided by timber processing companies. The average true positive rate exceeds 98.0 % in every case, while it is shown how to reduce the false positive rate to less than 0.5 %.

Christopher Herbon, Benjamin Otte, Klaus Tönnies, Bernd Stock

On the Second Order Statistics of Essential Matrix Elements

In this paper, we investigate the second order statistics of essential matrix elements. Using the Taylor expansion for a rotation matrix up to second order terms and considering relatively high uncertainties for the rotation angles and translation parameters, a covariance matrix is obtained which includes the second order statistics of essential matrix elements. The covariance matrix is utilized along with the coplanarity equations and acts as a regularization term. Using the regularization term brings considerable improvements in the recovery of camera motion which will be proven based on simulation and different real image sequences.

M. Hossein Mirabdollah, Bärbel Mertsching

Geometric Reasoning for Uncertain Observations of Man-Made Structures

Observations of man-made structures in terms of digital images, laser scans or sketches are inherently uncertain due to the acquisition process. Thus reverse engineering has to be applied to obtain topologically consistent and geometrically correct model instances by feature aggregation. The corresponding spatial reasoning process usually implies the detection of adjacencies, the generation and testing of hypotheses, and finally the enforcement of the detected relations. We present a complete and general work-flow for geometric reasoning that takes the uncertainty of the observations and of the derived low-level features into account. Thereby we exploit algebraic projective geometry to ease the formulation of geometric constraints. As this comes at the expense of an over-parametrization, we introduce an adjustment model which stringently incorporates uncertainty and copes with singular covariance matrices. The size of the resulting normal equation system depends only on the number of established constraints which paves the way to efficient solutions. We demonstrate the usefulness and the feasibility of the approach with results for the automatic analysis of a sketch and for a building reconstruction based on an airborne laser scan.

Jochen Meidow

Locality Sensitive Hashing Using GMM

We propose a new approach for locality sensitive hashes (LSH) solving the approximate nearest neighbor problem. A well known LSH family uses linear projections to place the samples of a dataset into different buckets. We extend this idea and, instead of using equally spaced buckets, use a Gaussian mixture model to build a data dependent mapping.

Fabian Schmieder, Bin Yang

Coded Aperture Flow

Real cameras have a limited depth of field. The resulting defocus blur is a valuable cue for estimating the depth structure of a scene. Using coded apertures, depth can be estimated from a single frame. For optical flow estimation between frames, however, the depth dependent degradation can introduce errors. These errors are most prominent when objects move relative to the focal plane of the camera. We incorporate coded aperture defocus blur into optical flow estimation and allow for piecewise smooth 3D motion of objects. With coded aperture flow, we can establish dense correspondences between pixels in succeeding coded aperture frames. We compare several approaches to compute accurate correspondences for coded aperture images showing objects with arbitrary 3D motion.

Anita Sellent, Paolo Favaro

Kernel Density Estimation for Post Recognition Score Analysis

Post processing pattern recognition results has long been an effective way to reduce the false recognitions by rejecting results that are deemed wrong by a verification system. Recent work laid down a theoretical foundation for a specific post recognition approach. This approach was termed Meta Recognition by its inventors and is based on a statistical outlier detection that makes use of the Weibull distribution. Using distance or similarity scores that are generated at recognition time, Meta Recognition automatically classifies a recognition result to be correct or incorrect. In this paper we present a novel approach to Meta Recognition using a kernel density estimation. We show this approach to be able to outperform the aforementioned post processing technique in different scenarios.

Sebastian Sudholt, Leonard Rothacker, Gernot A. Fink

Recognizing Scene Categories of Historical Postcards

The recognition of visual scene categories is a challenging issue in computer vision. It has many applications like organizing and tagging private or public photo collections. While most approaches are focused on web image collections, some of the largest unorganized image collections are historical images from archives and museums. In this paper the problem of recognizing categories in historical images is considered. More specifically, a new dataset is presented that addresses the analysis of a challenging collection of postcards from the period of World War I delivered by the German military postal service. The categorization of these postcards is of greater interest for historians in order to gain insights about the society during these years. For computer vision research the postcards pose various new challenges such as high degradations, varying visual domains like sketches, photographs or colorization and incorrect orientations due to an image in the image problem. The incorrect orientation is addressed by a pre-processing step that classifies the images into portrait or landscapes. In order to cope with the different visual domains an ensemble that incorporates global feature representations and features that are derived from detection results is used. The experiments on a development set and a large unexplored test set show that the proposed methods allow for improving the recognition on the historical postcards compared to a Bag-of-Features based scene categorization.

Rene Grzeszick, Gernot A. Fink

A Stochastic Late Fusion Approach to Human Action Recognition in Unconstrained Images and Videos

Recognizing human actions in unconstrained videos and still images has attracted considerable interest in recent research. An increasingly popular trend is to use ensembles of multiple features and classifiers in order to cope with different aspects such as motion, scene, pose and context. It has been observed that

late fusion

of predictions from individual classifiers offers more robustness than the

early fusion

of feature descriptors. In this paper, we present a novel framework for the late fusion of probabilistic predictions of different classifiers which is based on formulating and solving constrained quadratic optimization problems. In contrast to late fusion methods such as the sum-rule and the linear weighting, our approach binds constraints on mixture coefficients such that they represent the posterior of every participating classifier for each class. Further, unlike fusion by Bayesian inference, the proposed approach minimizes an error function that also considers correlations among different models. Experiments on three video and image action datasets show that our approach outperforms other late fusion techniques. In particular we report 6 %–8 % improvement compared to previously published results on two benchmark datasets.

Muhammad Shahzad Cheema, Abdalrahman Eweiwi, Christian Bauckhage

A Dense Pipeline for 3D Reconstruction from Image Sequences

We propose a novel pipeline for 3D reconstruction from image sequences that solely relies on dense methods. At no point sparse features are required. As input we only need a sequence of color images capturing a static scene while following a continuous path. Furthermore, we assume that an intrinsic camera calibration is known. Our pipeline comprises three steps: (1) First, we jointly estimate correspondences and stereo geometry for each two consecutive images. (2) Subsequently, we connect the individual pairwise estimates and globally refine them through bundle adjustment. As a result, all camera poses are merged into a consistent global model. This allows us to create accurate depth maps. (3) Finally, these depth maps are merged using variational range image integration techniques. Experiments show that our dense pipeline is an interesting alternative to sparse approaches. It yields accurate camera poses as well as 3D reconstructions.

Timm Schneevoigt, Christopher Schroers, Joachim Weickert

Active Online Learning for Interactive Segmentation Using Sparse Gaussian Processes

We present an active learning framework for image segmentation with user interaction. Our system uses a sparse Gaussian Process classifier (GPC) trained on manually labeled image pixels (user scribbles) and refined in every active learning round. As a special feature, our method uses a very efficient online update rule to compute the class predictions in every round. The final segmentation of the image is computed via convex optimization. Results on a standard benchmark data set show that our algorithm is better than a recent state-of-the-art method. We also show that the queries made by the algorithm are more informative compared to randomly increasing the training data, and that our online version is much faster than the standard offline GPC inference.

Rudolph Triebel, Jan Stühmer, Mohamed Souiai, Daniel Cremers

Multi-view Tracking of Multiple Targets with Dynamic Cameras

We propose a new tracking-by-detection algorithm for multiple targets from multiple dynamic, unlocalized and unconstrained cameras. In the past tracking has either been done with multiple static cameras, or single and stereo dynamic cameras. We register several moving cameras using a given 3D model from Structure from Motion (SfM), and initialize the tracking given the registration. The camera uncertainty estimate can be efficiently incorporated into a flow-network formulation for tracking. As this is a novel task in the tracking domain, we evaluate our method on a new challenging dataset for tracking with multiple moving cameras and show that our tracking method can effectively deal with independently moving cameras and camera registration noise.

Till Kroeger, Ralf Dragon, Luc Van Gool

Quality Based Information Fusion in Fully Automatized Celiac Disease Diagnosis

Up to now, for most endoscopical computer aided celiac disease diagnosis approaches, image regions showing discriminative features have to be manually extracted by the physicians, prior to their automatized classification. This is obligatory to get idealistic and reliable data which is free from strong image degradations. On the one hand such a human interaction during endoscopy is subjective, expensive and tedious, but on the other hand state-of-the-art fully automatized selection corresponds to decreased classification accuracies compared to experienced human experts. In this work, a fully automatized approach is introduced which exploits the availability of a significant number of subimages within one original endoscopic image. A weighted decision-level and a weighted feature-level fusion method are introduced and investigated with respect to the achieved classification accuracies. The outcomes are compared with simple decision-level and feature-level fusion methods and the manual and the automatized patch selection. Finally, we show that the proposed feature-level fusion method outperforms all other automatized methods and comes close to manual patch selection.

Michael Gadermayr, Andreas Uhl, Andreas Vécsei

Fine-Grained Activity Recognition with Holistic and Pose Based Features

Holistic methods based on dense trajectories [

29

,

30

] are currently the de facto standard for recognition of human activities in video. Whether holistic representations will sustain or will be superseded by higher level video encoding in terms of body pose and motion is the subject of an ongoing debate [

12

]. In this paper we aim to clarify the underlying factors responsible for good performance of holistic and pose-based representations. To that end we build on our recent dataset [

2

] leveraging the existing taxonomy of human activities. This dataset includes

$$24,920$$

video snippets covering

$$410$$

human activities in total. Our analysis reveals that holistic and pose-based methods are highly complementary, and their performance varies significantly depending on the activity. We find that holistic methods are mostly affected by the number and speed of trajectories, whereas pose-based methods are mostly influenced by viewpoint of the person. We observe striking performance differences across activities: for certain activities results with pose-based features are more than twice as accurate compared to holistic features, and vice versa. The best performing approach in our comparison is based on the combination of holistic and pose-based approaches, which again underlines their complementarity.

Leonid Pishchulin, Mykhaylo Andriluka, Bernt Schiele

Obtaining 2D Surface Characteristics from Specular Surfaces

Today’s surface appearance measures often ignore the inherent two-dimensionality. This paper proposes a method to acquire and assess the appearance of larger specular surfaces in 2D. First, we describe a deflectometric setup to obtain a gradient field of the surface microstructure. Hence, we propose an areal measure based on the angular power spectrum, as defined in ISO 25178, to characterize the waviness of coated surfaces in relevant scales. To verify the validity of this measure, we compare it with an 1D industry standard appearance measurement system (wave-scan). While our method shows the same characteristics when mapped to the wave-scan values, we observed differences between both systems. These are mainly caused by the different measurement principles and the resulting information of the surface.

Mathias Ziebarth, Markus Vogelbacher, Sabine Olawsky, Jürgen Beyerer

Learning Must-Link Constraints for Video Segmentation Based on Spectral Clustering

In recent years it has been shown that clustering and segmentation methods can greatly benefit from the integration of prior information in terms of must-link constraints. Very recently the use of such constraints has been integrated in a rigorous manner also in graph-based methods such as normalized cut. On the other hand spectral clustering as relaxation of the normalized cut has been shown to be among the best methods for video segmentation. In this paper we merge these two developments and propose to learn must-link constraints for video segmentation with spectral clustering. We show that the integration of learned must-link constraints not only improves the segmentation result but also significantly reduces the required runtime, making the use of costly spectral methods possible for today’s high quality video.

Anna Khoreva, Fabio Galasso, Matthias Hein, Bernt Schiele

Young Researcher Forum

Frontmatter

Automatic 3D Reconstruction of Indoor Manhattan World Scenes Using Kinect Depth Data

This paper discusses a system to reconstruct indoor scenes automatically and evaluates its accuracy and applicability. The focus is on the realization of a simple, quick and inexpensive way to map empty or slightly furnished rooms. The data is acquired with a Kinect sensor mounted onto a pan-tilt head. The Manhattan world assumption is used to approximate the environment. The approach for determining the wall, floor and ceiling planes of the rooms is based on a plane sweep method. The floor plan is reconstructed from the detected planes using an iterative flood fill algorithm. Furthermore, the developed method allows to detect doors and windows, generate 3D models of the measured rooms and to merge multiple scans.

Dominik Wolters

Can Cosegmentation Improve the Object Detection Quality?

In order to train an object detector usually a large annotated dataset is needed, which is expensive and cumbersome to acquire. In this paper the task of collecting these annotations is automated to a large extent by cosegmentation, i.e. the simultaneous segmentation of multiple images. This way only weak requirements on the input must be obeyed: The respective object must occur in every image exactly once and has to be at least slightly salient. Obviously, this facilitates the collection of an appropriate training set. On the cosegmentation’s result a straightforward object detector is trained for the underlying object. Both steps, cosegmentation and detection, share the representation of regions. Results show competitive results on cosegmentation datasets and indicate that detection actually benefits from a prior cosegmentation.

Timo Lüddecke

Multi-atlas Based Segmentation of Corpus Callosum on MRIs of Multiple Sclerosis Patients

In this work, a supervised automatic multi-atlas based segmentation method for corpus callosum (CC) in magnetic resonance images (MRIs) of MS patients is presented. Due to atrophy, the shape of disease affected CC differs distinctively from healthy ones. Therefore, atlases are used that are built from the underlying dataset and do not originate from atlas datasets of healthy brains. The atlas construction is done by clustering the patient images into subgroups of similar images and building a mean image from each cluster. During this work, the optimal number of atlases and the best label fusion method are analyzed. The method is evaluated on 100 T1-weighted brain MRI images from MS patients. Accuracy is assessed by comparing the overlap of the segmentations from the developed method against manual segmentations obtained by a medical student.

Anneke Meyer

Committees of Deep Feedforward Networks Trained with Few Data

Deep convolutional neural networks are known to give good results on image classification tasks. In this paper we present a method to improve the classification result by combining multiple such networks in a committee. We adopt the STL-10 dataset which has very few training examples and show that our method can achieve results that are better than the state of the art. The networks are trained layer-wise and no backpropagation is used. We also explore the effects of dataset augmentation by mirroring, rotation, and scaling.

Bogdan Miclut

Gas Bubble Shape Measurement and Analysis

This work focuses on the precise quantification of bubble streams from underwater gas seeps. The performance of the snake based method and of ellipse fitting with the CMA-ES non-linear optimization algorithm is evaluated. A novel improved snake based method is presented and the optimal choice of snake parameters is studied. A Kalman filter is used for bubble tracking. The deviation between the measured flux and a calibrated flux meter is 4 % for small and 9 % for larger bubbles. This work will allow a better data gathering on marine gas seeps for future climatology and marine research.

Claudius Zelenka

Scene Segmentation in Adverse Vision Conditions

Semantic road labeling is a key component of systems that aim at assisted or even autonomous driving. Considering that such systems continuously operate in the real-world, unforeseen conditions not represented in any conceivable training procedure are likely to occur on a regular basis. In order to equip systems with the ability to cope with such situations, we would like to enable adaptation to such new situations and conditions at runtime. We study the effect of changing test conditions on scene labeling methods based on a new diverse street scene dataset. We propose a novel approach that can operate in such conditions and is based on a sequential Bayesian model update in order to robustly integrate the arriving images into the adapting procedure.

Evgeny Levinkov

Learning Multi-scale Representations for Material Classification

The recent progress in sparse coding and deep learning has made unsupervised feature learning methods a strong competitor to hand-crafted descriptors. In computer vision, success stories of learned features have been predominantly reported for object recognition tasks. In this paper, we investigate if and how feature learning can be used for material recognition. We propose two strategies to incorporate scale information into the learning procedure resulting in a novel multi-scale coding procedure. Our results show that our learned features for material recognition outperform hand-crafted descriptors on the FMD and the KTH-TIPS2 material classification benchmarks.

Wenbin Li

Casting Random Forests as Artificial Neural Networks (and Profiting from It)

While Artificial Neural Networks (ANNs) are highly expressive models, they are hard to train from limited data. Formalizing a connection between Random Forests (RFs) and ANNs allows exploiting the former to initialize the latter. Further parameter optimization within the ANN framework yields models that are intermediate between RF and ANN, and achieve performance better than RF and ANN on the majority of the UCI datasets used for benchmarking.

Johannes Welbl

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise