Skip to main content

2013 | Buch

Image Analysis

18th Scandinavian Conference, SCIA 2013, Espoo, Finland, June 17-20, 2013. Proceedings

herausgegeben von: Joni-Kristian Kämäräinen, Markus Koskela

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 18th Scandinavian Conference on Image Analysis, SCIA 2013, held in Espoo, Finland, in June 2013.

The 67 revised full papers presented were carefully reviewed and selected from 132 submissions. The papers are organized in topical sections on feature extraction and segmentation, pattern recognition and machine learning, medical and biomedical image analysis, faces and gestures, object and scene recognition, matching, registration, and alignment, 3D vision, color and multispectral image analysis, motion analysis, systems and applications, human-centered computing, and video and multimedia analysis.

Inhaltsverzeichnis

Frontmatter

Feature Extraction and Segmentation

Texture Description with Completed Local Quantized Patterns

Local binary patterns (LBP) has been very successful in a number of areas, including texture analysis and face analysis. Recently, local quantized patterns (LQP) was proposed to use vector quantization to code complicated patterns with a large number of neighbors and several quantization levels. It uses lookup table technique to map patterns into the corresponding indices. In this paper, we propose completed local quantized patterns (CLQP) for improving the performance of LQP. Firstly, we find that LQP only considers the sign-based difference, it thus misses some discriminative information. We therefore propose to use the magnitude-based and orientation-based differences to complement the sign-based difference for LQP. We finally use vector quantization to learn three separate codebooks for local sign, magnitude and orientation patterns, respectively. Secondly, we also observe that LQP uses random initialization in vector quantization, this leads to losing the distribution of local patterns and costing much computational time. For reducing the unnecessary computational time of initialization, we use preselected dominant patterns as the initialization. Our experimental results show that CLQP outperforms well-established features including LBP, LTP, CLBP, LQP on a range of challenging texture classification problems and an infant pain detection problem.

Xiaohua Huang, Guoying Zhao, Xiaopeng Hong, Matti Pietikäinen, Wenming Zheng
A Local Image Descriptor Robust to Illumination Changes

In this paper we address the problem of building a local image descriptor that is insensitive to the complex appearance changes induced by illumination variations on non-flat objects. The presented descriptor is based on multi-scale and multi-oriented even Gabor filters and constructed in such a way that typical effects of illumination variations like changes of edge polarity or spatially varying brightness changes are taken into account for illumination insensitivity. For evaluation, a dataset of textured as well as textureless objects is used which introduces a greater challenge towards evaluating the robustness against illumination changes than conventional datasets used in the past. The experiments finally show the superiority of our descriptor compared to existing ones under illumination changes.

Sebastian Zambanini, Martin Kampel
Detection of Curvilinear Structures by Tensor Voting Applied to Fiber Characterization

The paper presents a framework for the detection of curvilinear objects in images. Such objects are challenging to be described by a geometrical model, and although they appear in a number of applications, the problem of detecting curvilinear objects has drawn limited attention. The proposed approach starts with an edge detection algorithm after which the task of object detection becomes a problem of edge linking. A state-of-the-art local linking approach called tensor voting is used to estimate the edge point saliency describing the likelihood of a point belonging to a curve, and to extract the end points and junction points of these curves. After the tensor voting, the curves are grown from high-saliency seed points utilizing a linking method proposed in this paper. In the experimental part of the work, the method was systematically tested on pulp suspension images to characterize fibers based on their length and curl index. The fiber length was estimated with the accuracy of 71.5% and the fiber curvature with the accuracy of 70.7%.

Nataliya Strokina, Tatiana Kurakina, Tuomas Eerola, Lasse Lensu, Heikki Kälviäinen
Simple-Graphs Fusion in Image Mosaic: Application to Automated Cell Files Identification in Wood Slices

Results aggregation by disjoint graph merging is potentially a good alternative to image stitching. During the processing of image mosaics, it allows to be free of radiometric and geometric corrections inherent in image fusion. We have studied and developed a generic merging method of disjoint graphs for tracking cell alignments in image mosaics of wood.

Guilhem Brunel, Philippe Borianne, Gérard Subsol, Marc Jaeger
Unsupervised Segmentation of Anomalies in Sequential Data, Images and Volumetric Data Using Multiscale Fourier Phase-Only Analysis

This paper presents an unsupervised method to detect and segment anomalies and novel patterns in sequential data, images and volumetric data. The proposed Multiscale Phase-Only Transformation (MPHOT) addresses the case when no prior knowledge about the data or even its dimensionality is provided. It is based on the fusion of the Phase-Only Transform (PHOT) in scale space using only one adaptive sensitivity parameter. The PHOT uses the Discrete Fourier Transform (DFT) to remove all regularities while it detects small defects and pattern boundaries. The proposed multiscale extension allows the precise segmentation of large anomalies as well. We present experiments on synthetic and measured data in fields of time series analysis, image processing and volumetric data segmentation to show the universality of our approach.

Fabian Bürger, Josef Pauli
Extended 3D Line Segments from RGB-D Data for Pose Estimation

We propose a method for the extraction of complete and rich symbolic line segments in 3D based on RGB-D data. Edges are detected by combining cues from the RGB image and the aligned depth map. 3D line segments are then reconstructed by back-projecting 2D line segments and intersecting this with local surface patches computed from the 3D point cloud. Different edge types are classified using the new enriched representation and the potential of this representation for the task of pose estimation is demonstrated.

Anders Glent Buch, Jeppe Barsøe Jessen, Dirk Kraft, Thiusius Rajeeth Savarimuthu, Norbert Krüger
Incorporating Texture Intensity Information into LBP-Based Operators

In this paper, we aim to improve the accuracy of LBP-based operators by including texture image intensity characteristics in the operator. We utilize shifted step function to minimize the quantization error of the step function to obtain more discriminative operators. Features obtained from shifted step function are simply fused together to form the final histogram. This model is generalized and can be integrated with other existing LBP variants to reduce quantization error of the step function for texture classification. The proposed method is integrated with multiple LBP-based feature descriptors and evaluated on publicly available texture databases (Outex_TC_00012 and KTH-TIPS2b) for texture classification. Experimental results demonstrate that it not only improves the performance of operators it is integrated with but also achieves higher accuracy compared to the state of the art in texture classification.

M. Ghahramani, Guoying Zhao, Matti Pietikäinen
Bayesian Non-parametric Image Segmentation with Markov Random Field Prior

In this paper, a Bayesian framework for non-parametric density estimation with spatial smoothness constraints is presented for image segmentation. Unlike common parametric methods such as mixtures of Gaussians, the proposed method does not make strict assumptions about the shape of the density functions and thus, can handle complex structures. The multiclass kernel density estimation is considered as an unsupervised learning problem. A Dirichlet compound multinomial (DCM) prior is used to model the class label prior probabilities and a Markov random field (MRF) is exploited to impose the spatial smoothness and control the confidence on the Dirichlet hyper-parameters, as well. The proposed model results in a closed form solution using an expectation-maximization (EM) algorithm for maximum a posteriori (MAP) estimation. This provides a huge advantage over other models which utilize more complex and time consuming methods such as Markov chain Monte Carlo (MCMC) or variational approximation methods. Several experiments on natural images are performed to present the performance of the proposed model compared to other parametric approaches.

Ehsan Amid
Evaluating Local Feature Detectors in Salient Region Detection

In this work, we study local feature extraction methods and evaluate their performance in detecting local features from the salient regions of images. In order to measure the detectors’ performance, we compared the detected regions to gaze fixations obtained from the eye movement recordings of human participants viewing two types of images: natural images (photographs) and abstract/surreal images. The results indicate that all of the six evaluated local feature detectors perform clearly above chance level. The Hessian-Affine detector performs the best and almost reaches the performance level of state-of-the-art saliency detection methods.

Teemu Kinnunen, Mari Laine-Hernandez, Pirkko Oittinen
Forest Stand Delineation Using a Hybrid Segmentation Approach Based on Airborne Laser Scanning Data

Forest stand delineation is an important task for forest management. Traditional manual stand delineation based on aerial color-infrared images is a labor intensive process and its results are partially subjective. These images are also highly affected by weather conditions and imaging parameters. In this work, we applied a hybrid segmentation approach on Airborne Laser Scanning (ALS) data to delineate forest stands. The ALS data was firstly pre-processed to extract a three band feature image, containing tree height, density, and species information, respectively. Then the image was segmented by the mean shift algorithm to generate raw stands, which were refined by the Spectral Clustering (SC) algorithm in the following stage. In the SC algorithm, we also estimated the number of stands based on eigengap heuristics. We tested our method on real ALS data acquired at Juuka in Finland, and compared the results with the manually delineated result visually and numerically, as well as results based on previous methods. The experimental results showed that our method worked well for the forest stand delineation based on ALS data, and return better results in most cases when compared to previous methods.

Zhengzhe Wu, Ville Heikkinen, Markku Hauta-Kasari, Jussi Parkkinen, Timo Tokola
Mean Shift with Flatness Constraints

Mean shift still belongs to the intensively developed image-segmentation methods. Appropriately setting so called bandwidth, which is richly discussed in literature, seems to be one of its problems. If the bandwidth is too small, the results suffer from over-segmentation. If it is too big, the edges need not be preserved sufficiently and the details can be lost. In this paper, we address the problem of over-segmentation and preserving the edges in mean shift too. However, we do not aim at proposing a further method for determining the bandwidth. Instead, we modify the mean-shift method itself. We show that the problems with over-segmentation are inherent for mean shift and follow from its theoretical essence. We also show that the mean-shift process can be seen as a process of solving a certain Euler-Lagrange equation and as a process of maximising a certain functional. In contrast with other known functional approaches, however, only the fidelity term is present in it. Other usual terms, e.g., the term requiring a short length of boundaries between the segments or the term requiring the flatness (in intensity) of the corresponding filtered image are not present, which explains the behaviour of mean shift. On the basis of this knowledge, we solve the problems with mean shift by modifying the functional. We show how the new functional can be maximised in practice, and we also show that the usual mean-shift algorithm can be regarded as a special case of the method we propose. The experimental results are also presented.

Eduard Sojka, Milan Šurkala, Jan Gaura
Constructing Local Binary Pattern Statistics by Soft Voting

In this paper we propose a novel method for constructing Local Binary Pattern (LBP) statistics for image appearance description. The method is inspired by the kernel density estimation designed for estimating the underlying probability function of a random variable. An essential part of the proposed method is the use of Hamming distance. Compared to the standard LBP histogram statistics where one labeled pixel always contributes to one bin of the histogram, the proposed method exploits a kernel-like similarity function to determine weighted votes contributing several possible pattern types in the statistic. As a result, the method yields a more reliable estimate of the underlying LBP distribution of the given image. In overall, the method is easy to implement and outperforms the standard LBP histogram description in texture classification and in biometrics-related face verification. We demonstrate that the method is extremely potential in problems where the number of pixels is limited. This makes the method very promising, for example, in low-resolution image description and the description of interest regions. Another interesting property of the proposed method is that it can be easily integrated with many existing LBP variants that use label statistics as descriptors.

Juha Ylioinas, Xiaopeng Hong, Matti Pietikäinen

Pattern Recognition and Machine Learning

Cascaded Random Forest for Fast Object Detection

A Random Forest consists of several independent decision trees arranged in a forest. A majority vote over all trees leads to the final decision. In this paper we propose a Random Forest framework which incorporates a cascade structure consisting of several stages together with a bootstrap approach. By introducing the cascade, 99% of the test images can be rejected by the first and second stage with minimal computational effort leading to a massively speeded-up detection framework. Three different cascade voting strategies are implemented and evaluated. Additionally, the training and classification speed-up is analyzed. Several experiments on public available datasets for pedestrian detection, lateral car detection and unconstrained face detection demonstrate the benefit of our contribution.

Florian Baumann, Arne Ehlers, Karsten Vogt, Bodo Rosenhahn
Multiplicative Updates for Learning with Stochastic Matrices

Stochastic matrices are arrays whose elements are discrete probabilities. They are widely used in techniques such as Markov Chains, probabilistic latent semantic analysis, etc. In such learning problems, the learned matrices, being stochastic matrices, are non-negative and all or part of the elements sum up to one. Conventional multiplicative updates which have been widely used for nonnegative learning cannot accommodate the stochasticity constraint. Simply normalizing the nonnegative matrix in learning at each step may have an adverse effect on the convergence of the optimization algorithm. Here we discuss and compare two alternative ways in developing multiplicative update rules for stochastic matrices. One reparameterizes the matrices before applying the multiplicative update principle, and the other employs relaxation with Lagrangian multipliers such that the updates jointly optimize the objective and steer the estimate towards the constraint manifold. We compare the new methods against the conventional normalization approach on two applications, parameter estimation of Hidden Markov Chain Model and Information-Theoretic Clustering. Empirical studies on both synthetic and real-world datasets demonstrate that the algorithms using the new methods perform more stably and efficiently than the conventional ones.

Zhanxing Zhu, Zhirong Yang, Erkki Oja
Validating the Visual Saliency Model

Bottom up attention models suggest that human eye movements can be predicted by means of algorithms that calculate the difference between a region and its surround at different image scales where it is suggested that the more different a region is from its surround the more salient it is and hence the more it will attract fixations. Recent studies have however demonstrated that a dummy classifier which assigns more weight to the center region of the image out performs the best saliency algorithm calling into doubt the validity of the saliency algorithms and their associated bottom up attention models. In this paper, we performed an experiment using linear discrimination analysis to try to separate between the values obtained from the saliency algorithm for regions that have been fixated and others that haven’t. Our working hypothesis was that being able to separate the regions would constitute a proof as to the validity of the saliency model. Our results show that the saliency model performs well in predicting non-salient regions and highly salient regions but that it performs no better than a random classifier in the middle range of saliency.

Ali Alsam, Puneet Sharma
Bundle Methods for Structured Output Learning — Back to the Roots

Discriminative methods for learning structured output classifiers have been gaining popularity in recent years due to their successful applications in fields like computer vision, natural language processing, etc. Learning of the structured output classifiers leads to solving a convex minimization problem, still hard to solve by standard algorithms in real-life settings. A significant effort has been put to development of specialized solvers among which the Bundle Method for Risk Minimization (BMRM) [1] is one of the most successful. The BMRM is a simplified variant of bundle methods well known in the filed of non-smooth optimization. In this paper, we propose two speed-up improvements of the BMRM: i) using the adaptive prox-term known from the original bundle methods, ii) starting optimization from a non-trivial initial solution. We combine both improvements with the multiple cutting plane model approximation [2]. Experiments on real-life data show consistently faster convergence achieving speedup up to factor of 9.7.

Michal Uřičář, Vojtěch Franc, Václav Hlaváč
Continuous-Space Gaussian Process Regression and Generalized Wiener Filtering with Application to Learning Curves

Gaussian process regression is a machine learning paradigm, where the regressor functions are modeled as realizations from an a priori Gaussian process model. We study abstract continuous-space Gaussian regression problems where the training set covers the whole input space instead of consisting of a finite number of distinct points. The model can be used for analyzing theoretical properties of Gaussian process regressors. In this paper, we present the general continuous-space Gaussian process regression equations and discuss their close connection with Wiener filtering. We apply the results to estimation of learning curves as functions of training set size and input dimensionality.

Simo Särkkä, Arno Solin
Approximations of Gaussian Process Uncertainties for Visual Recognition Problems

Gaussian processes offer the advantage of calculating the classification uncertainty in terms of predictive variance associated with the classification result. This is especially useful to select informative samples in active learning and to spot samples of previously unseen classes known as novelty detection. However, the Gaussian process framework suffers from high computational complexity leading to computation times too large for practical applications. Hence, we propose an approximation of the Gaussian process predictive variance leading to rigorous speedups. The complexity of both learning and testing the classification model regarding computational time and memory demand decreases by one order with respect to the number of training samples involved. The benefits of our approximations are verified in experimental evaluations for novelty detection and active learning of visual object categories on the datasets C-Pascal of Pascal VOC 2008, Caltech-256, and ImageNet.

Paul Bodesheim, Alexander Freytag, Erik Rodner, Joachim Denzler
Topology-Preserving Dimension-Reduction Methods for Image Pattern Recognition

In this paper, we experimentally evaluate the validity of dimension-reduction methods which preserve topology for image pattern recognition. Image pattern recognition uses pattern recognition techniques for the classification of image data. For the numerical achievement of image pattern recognition techniques, images are sampled using an array of pixels. This sampling procedure derives vectors in a higher-dimensional metric space from image patterns. For the accurate achievement of pattern recognition techniques, the dimension reduction of data vectors is an essential methodology, since the time and space complexities of data processing depend on the dimension of data. However, the dimension reduction causes information loss of geometrical and topological features of image patterns. The desired dimension-reduction method selects an appropriate low-dimensional subspace that preserves the topological information of the classification space.

Hayato Itoh, Tomoya Sakai, Kazuhiko Kawamoto, Atsushi Imiya
Efficient Boosted Weak Classifiers for Object Detection

This paper accelerates boosted nonlinear weak classifiers in boosting framework for object detection. Although conventional nonlinear classifiers are usually more powerful than linear ones, few existing methods integrate them into boosting framework as weak classifiers owing to the highly computational cost. To address this problem, this paper proposes a novel nonlinear weak classifier named

Partition Vector weak Classifier

(PVC), which is based on the histogram intersection kernel functions of the feature vector with respect to a set of pre-defined

Partition Vectors

(PVs). A three-step algorithm is derived from the kernel trick for efficient weak learning. The obtained PVCs are further accelerated via building a look-up table. Experimental results in the detection tasks for multiple classes of objects show that boosted PVCs significantly improves both learning and evaluation efficiency of

nonlinear

SVMs to the level of boosted linear classifiers, without losing any of the high discriminative power.

Xiaopeng Hong, Guoying Zhao, Haoyu Ren, Xilin Chen
Domain Adaptation for Sequential Detection

We propose a domain adaptation method for sequential decision-making process. While most of the state-of-the-art approaches focus on SVM detectors, we propose the domain adaptation method for the sequential detector similar to WaldBoost, which is suitable for real-time processing. The work is motivated by applications in surveillance, where detectors must be adapted to new observation conditions. We address the situation, where the new observation is related to the previous observation by a parametric transformation. We propose a learning procedure, which reveals the hidden transformation between the old and new data. The transformation essentially allows to transfer the knowledge from the old data to the new one. We show that our method can achieve a 60% speedup in the training w.r.t. the baseline WaldBoost algorithm while outperforming it in precision.

Šimon Fojtů, Karel Zimmermann, Tomáš Pajdla, Václav Hlaváč
Introducing a Inter-frame Relational Feature Model for Pedestrian Detection

Pedestrian detection has been used with the help of various local features in still images such as histograms of oriented gradients (HOG), local binary patterns (LBP) and more recently, the histograms of optical flow (HOF). In order to improve the robustness of pedestrian detection, movement of people can be taken into the training process which has been done in the HOF descriptor. Optical flow is used to model the movement of a person and to detect actions in image sequences. For action recognition it is necessary to incorporate movement into models when using feature descriptors such as the HOF descriptor. In this paper we introduce a novel method to train and to detect human movement for pedestrian detection using relational gradient features within multiple consecutive frames. The goal of this descriptor is to detect pedestrians using multiple frames for moving cameras instead of static cameras. The relational features between consecutive frames help to robustly find pedestrians in image sequences due to a flexible detection algorithm. We demonstrate the robustness of the resulting feature model computed for a temporal time window of three frames. In our experiments we show the improvement regarding true positives as well as false positives using our inter-frame HOG (ifHOG) model compared to other feature descriptors.

Andreas Zweng, Martin Kampel

Medical and Biomedical Image Analysis

Automated Cell Counting in Bürker Chamber

Estimating the number of blood cells in a sample is an important task in biological research. However, manual counting of cells in microscopy images of counting chambers is very time-consuming. We present an image processing method for detecting the chamber grid and the cells, based on their similarity to an automatically selected sample cell. Due to this approach, the method does not depend on specific cell structure, and can be used for blood cells of different species without adjustments. If deemed appropriate, user interaction is allowed to select the sample cell and adjust the parameters manually. We also present the accuracy and speed evaluation of the method.

Karel Štěpka
Adaptive Spatio-temporal Filtering of 4D CT-Heart

The aim of this project is to keep the x-ray exposure of the patient as low as reasonably achievable while improving the diagnostic image quality for the radiologist. The means to achieve these goals is to develop and evaluate an efficient adaptive filtering (denoising/image enhancement) method that fully explores true 4D image acquisition modes.

The proposed prototype system uses a novel filter set having directional filter responses being

monomials

. The monomial filter concept is used both for estimation of local structure and for the anisotropic adaptive filtering. Initial tests on clinical 4D CT-heart data with ECG-gated exposure has resulted in a significant reduction of the noise level and an increased detail compared to 2D and 3D methods. Another promising feature is that the reconstruction induced streak artifacts which generally occur in low dose CT are remarkably reduced in 4D.

Mats Andersson, Hans Knutsson
Toward Automated Validation of Sketch-Based 3D Segmentation Editing Tools

Segmentation is one of the main tasks in medical image analysis. Measuring the quality of 3D segmentation algorithms is an essential requirement during development and for evaluation. Various methods exist to measure the quality of a segmentation with respect to a reference segmentation. Validating interactive 3D segmentation approaches or methods for 3D segmentation editing is more complex, however. Using interactive tools, the user plays a central role during the segmentation process as he or she needs to react on intermediate results, making established static validation approaches insufficient. In this paper we present a method to automatically generate plausible user inputs for 3D sketch-based segmentation editing algorithms, to allow an objective and reproducible validation and comparison of such tools. The user inputs are generated iteratively based on the intermediate and the reference segmentation, while static quality measurements are tracked over time. We present first results where we have compared two segmentation editing algorithms using our framework.

Frank Heckel, Momchil I. Ivanov, Jan H. Moltz, Horst K. Hahn
FAST-PVE: Extremely Fast Markov Random Field Based Brain MRI Tissue Classification

We present an extremely fast method named FAST-PVE for tissue classification and partial volume estimation of 3-D brain magnetic resonance images (MRI) using a Markov Random Field (MRF) based spatial prior. The tissue classification problem is central to most brain MRI analysis pipelines and therefore solving it accurately and fast is important. The FAST-PVE method is experimentally confirmed to tissue classify a standard MR image in under 10 seconds with the quantitative accuracy similar to other state of art methods. A key component of the FAST-PVE method is the fast ICM algorithm, which is generally applicable to any MRF-based segmentation method, and formally proven to produce the same segmentation result as the standard ICM algorithm.

Jussi Tohka
Automated Tracing of Retinal Blood Vessels Using Graphical Models

As an early indication of diseases including diabetes, hypertension, and retinopathy of prematurity, structural study of retinal vessels becomes increasingly important. These studies have driven the need toward accurate and consistent tracing of retinal blood vessel tree structures from fundus images in an automated manner. In this paper we propose a two-step pipeline: First, the retinal vessels are segmented with the preference of preserving the skeleton network, i.e., retinal segmentation with a high recall. Second, a novel tracing algorithm is developed where the tracing problem is uniquely mapped to an inference problem in probabilistic graphical models. This enables the exploitation of well-developed inference toolkit in graphical models. The competitive performance of our method is verified on publicly available datasets comparing to the state-of-the-arts.

Jaydeep De, Tengfei Ma, Huiqi Li, Manoranjan Dash, Cheng Li
Genus Zero Graph Segmentation: Estimation of Intracranial Volume

The intracranial volume (ICV) in children with premature fusion of one or more sutures in the calvaria is of interest due to the risk of increased intracranial pressure. Challenges for automatic estimation of ICV include holes in the skull e.g. the foramen magnum and fontanelles. In this paper, we present a fully automatic 3D graph-based method for segmentation of the ICV in non-contrast CT scans. We reformulate the ICV segmentation problem as an optimal genus 0 segmentation problem in a volumetric graph. The graph is the result of a volumetric spherical subsample from the data connected using Delaunay tetrahedralisation. A Markov Random Field is constructed on the graph with probabilities learned from an Expectation Maximisation algorithm matching a Mixture of Gaussians to the data. Results are compared to manual segmentations performed by an expert. We have achieved very high Dice scores ranging from 98.14% to 99.00%, while volume deviation from the manual segmentation ranges from 0.7%-3.7%. The Hausdorff distance, which shows the maximum error from automatic to manual segmentation ranges, from 4.73-9.81mm. Since this is sensitive to single error, we have also found the 95% Hausdorff distance, which ranges from 1.10-3.65mm. The proposed method is expected to perform well for other volumetric segmentations.

Rasmus R. Jensen, Signe S. Thorup, Rasmus R. Paulsen, Tron A. Darvann, Nuno V. Hermann, Per Larsen, Sven Kreiborg, Rasmus Larsen
Vessel Wall Segmentation Using Implicit Models and Total Curvature Penalizers

This paper proposes an automatic segmentation method of vessel walls that combines an implicit 3D model of the vessels and a total curvature penalizer in a level set evolution scheme. First, the lumen is segmented by alternating a model-guided level set evolution and a recalculation of the model itself. Second, the level set of the lumen is evolved with a term that aims at penalizing the total curvature and with a prior that forces the outer layer of the vessel towards the outside of the lumen. The model term is deactivated during this step. Finally, in a third step, the model term is reactivated in order to impose a smooth change of the radius along the vessel. Once the two segmentations have been computed, stenoses are detected and quantified at the thickest locations of the segmented vessel wall. Preliminary results show that the proposed method compares favorably with respect to the state-of-the-art both for synthetic and real CTA datasets.

Rodrigo Moreno, Chunliang Wang, Örjan Smedby

Faces and Gestures

Dynamic 3D Facial Expression Recognition Using Robust Shape Features

In this paper we present a novel approach for dynamic facial expression recognition based on 3D geometric facial features. Geodesic distances between corresponding 3D open curves are computed and used as features to describe the facial changes across sequences of 3D face scans. Hidden Markov Models (HMMs) are exploited to learn the curves shape variation through a 3D frame sequences, and the trained models are used to classify six prototypic facial expressions. Our approach shows high performance, and an overall recognition rate of 94.45% is attained after a validation on the BU-4DFE database.

Ahmed Maalej, Hedi Tabia, Halim Benhabiles
Head Pose Estimation Using Multi-scale Gaussian Derivatives

In this paper we approach the problem of head pose estimation by combining Multi-scale Gaussian Derivatives with Support Vector Machines.

We evaluate the approach on the Pointing04 and CMU-PIE data sets and to estimate the pan and tilt of the head from facial images. We achieved a mean absolute error of 6.9 degrees for pan and 8.0 degrees for tilt on the Pointing04 data set.

Varun Jain, James L. Crowley
Finger Tracking for Gestural Interaction in Mobile Devices

In this paper we propose a finger tracking system that is suitable for gesture recognition in mobile devices. The initialisation of the system does not require the use of any I/O devices. The user covers the camera lense with his or her hand and then takes it to the operating distance. The statistical models used for hand segmentation are initialised from the first frames after the hand is removed from the lense. The hand segmentation does not need to be perfect because we do not use the hand contour in the recognition. In our method the fingertips are found using template matching. However, the template matching produces false detections. These false detections are pruned by searching a path from the fingertip to the estimated hand centre and discarding the paths that do not meet a predefined criteria. We evaluate the performance of the method against a fingertip detector proposed by Baldauf et al. [2] by using seven test subjects who initialise the system and then wave their hand in front of the camera. In testing we use one handheld USB camera that matches the image quality of most recent front cameras in mobile phones.

Matti Matilainen, Jari Hannuksela, Lixin Fan
Extracting Local Binary Patterns from Image Key Points: Application to Automatic Facial Expression Recognition

Facial expression recognition has widely been investigated in the literature. The need of accurate facial alignment has however limited the deployment of facial expression systems in real-world applications. In this paper, a novel feature extraction method is proposed. It is based on extracting local binary patterns (LBP) from image key points. The face region is first segmented into six facial components (left eye, right eye, left eyebrow, right eyebrow, nose, and mouth). Then, local binary patterns are extracted only from the edge points of each facial component. Finally, the local binary pattern features are collected into a histogram and fed to an SVM classifier for facial expression recognition. Compared to the traditional LBP methodology extracting the features from all image pixels, our proposed approach extracts LBP features only from a set of points of face components, yielding in more compact and discriminative representations. Furthermore, our proposed approach does not require face alignment. Extensive experimental analysis on the commonly used JAFFE facial expression benchmark database showed very promising results, outperforming those of the traditional local binary pattern approach.

Xiaoyi Feng, Yangming Lai, Xiaofei Mao, Jinye Peng, Xiaoyue Jiang, Abdenour Hadid
Head Pose Estimation for Sign Language Video

We address the problem of estimating three head pose angles in sign language video using the Pointing04 data set as training data. The proposed model employs facial landmark points and Support Vector Regression learned from the training set to identify yaw and pitch angles independently. A simple geometric approach is used for the roll angle. As a novel development, we propose to use the detected skin tone areas within the face bounding box as additional features for head pose estimation. The accuracy level of the estimators we obtain compares favorably with published results on the same data, but the smaller number of pose angles in our setup may explain some of the observed advantage.

We evaluated the pose angle estimators also against ground truth values from motion capture recording of a sign language video. The correlations for the yaw and roll angles exceeded 0.9 whereas the pitch correlation was slightly worse. As a whole, the results are very promising both from the computer vision and linguistic points of view.

Marcos Luzardo, Matti Karppa, Jorma Laaksonen, Tommi Jantunen
Detecting Hand-Head Occlusions in Sign Language Video

A large body of current linguistic research on sign language is based on analyzing large corpora of video recordings. This requires either manual or automatic annotation of the videos. In this paper we introduce methods for automatically detecting and classifying hand-head occlusions in sign language videos. Linguistically, hand-head occlusions are an important and interesting subject of study as the head is a structural place of articulation in many signs. Our method combines easily calculable local video properties with more global hand tracking. The experiments carried out with videos of the Suvi on-line dictionary of Finnish Sign Language show that the sensitivity of the proposed local method in detecting occlusion events is 92.6%. When global hand tracking is combined in the method, the specificity can reach the level of 93.7% while still maintaining the detection sensitivity above 90%.

Ville Viitaniemi, Matti Karppa, Jorma Laaksonen, Tommi Jantunen
Gender Recognition Using Nonsubsampled Contourlet Transform and WLD Descriptor

Gender recognition using facial images plays an important role in biometric technology. Multiscale texture descriptors perform better in gender recognition because they encode the multiscale facial microstructures in a better way. We present a gender recognition system that uses SVM, two-stage feature selection and multiscale texture feature based on Nonsubsampled Contourlet Transform and Weber law descriptor (NSCT-WLD). The proposed system has better recognition rate (99.50%) than the state-of-the-art methods on FERET database. This research also reveals that in NSCT decomposition what is essential for face recognition and what is important for other tasks like age detection.

Muhammad Hussain, Sarah Al-Otaibi, Ghulam Muhammad, Hatim Aboalsamh, George Bebis, Anwar M. Mirza

Object and Scene Recognition

Unsupervised Visual Object Categorisation with BoF and Spatial Matching

The ultimate challenge of image categorisation is unsupervised object discovery, where the selection of categories and the assignments of given images to these categories are performed automatically. The unsupervised setting prohibits the use of the best discriminative methods, and in Tuytelaars

et al

. [30] the standard Bag-of-Features (BoF) approach performed the best. The downside of the BoF is that it omits spatial information of local features. In this work, we propose a novel unsupervised image categorisation method which uses the BoF to find initial matches for each image (pre-filter) and then refines and ranks them using spatial matching of local features. Unsupervised visual object discovery is performed by the normalised cuts algorithm which produces the clusterings from a similarity matrix representing the spatial match scores. In our experiments, the proposed approach outperforms the best method in Tuytelaars

et al

with the Caltech-101, randomised Caltech-101, and Caltech-256 data sets. Especially for a large number of classes, clear and statistically significant improvements are achieved.

Teemu Kinnunen, Jukka Lankinen, Joni-Kristian Kämäräinen, Lasse Lensu, Heikki Kälviäinen
Improved Object Detection and Pose Using Part-Based Models

Automated object detection is perhaps the most central task of computer vision and arguably the most difficult one. This paper extends previous work on part-based models by using accurate geometric models both in the learning phase and at detection. In the learning phase manual annotations are used to reduce perspective distortion before learning the part-based models. That training is performed on rectified images, leads to models which are more specific, reducing the risk of false positives. At the same time a set of representative object poses are learnt. These are used at detection to remove perspective distortion. The method is evaluated on the bus category of the Pascal dataset with promising results.

Fangyuan Jiang, Olof Enqvist, Fredrik Kahl, Kalle Åström
Non Maximal Suppression in Cascaded Ranking Models

Ranking models have recently been proposed for cascaded object detection, and have been shown to improve over regression or binary classification in this setting [1,2]. Rather than train a classifier in a binary setting and interpret the function post hoc as a ranking objective, these approaches directly optimize regularized risk objectives that seek to score highest the windows that most closely match the ground truth. In this work, we evaluate the effect of non-maximal suppression (NMS) on the cascade architecture, showing that this step is essential for high performance. Furthermore, we demonstrate that non-maximal suppression has a significant effect on the tradeoff between recall different points on the overlap-recall curve. We further develop additional objectness features at low computational cost that improve performance on the category independent object detection task introduced by Alexe et al. [3]. We show empirically on the PASCAL VOC dataset that a simple and efficient NMS strategy yields better results in a typical cascaded detection architecture than the previous state of the art [4.1]. This demonstrates that NMS, an often ignored stage in the detection pipeline, can be a dominating factor in the performance of detection systems.

Matthew B. Blaschko, Juho Kannala, Esa Rahtu
Exploiting Object Characteristics Using Custom Features for Boosting-Based Classification

Typical feature pools used to train boosted object detectors contain various redundant and unspecific information which often yield less discriminative detectors. In this paper we introduce a feature mining algorithm taking domain specific knowledge into account. Our proposed feature pool contains rectangular shaped features generated from an image clustering algorithm applied on the mean image of the object training set. A combination of two such spatially separated rectangular regions yields a set of features which have a similar evaluation time like classical Haar-like features, but are much smarter (automatically) selected and more discriminative since image correlations can be more consequently exploited. Overall, training is faster and results in more selective detectors showing improved precision. Several experiments demonstrate the gain when using our proposed feature set in contrast to standard features.

Arne Ehlers, Florian Baumann, Bodo Rosenhahn
Apprenticeship Learning: Transfer of Knowledge via Dataset Augmentation

In visual category recognition there is often a trade-off between fast and powerful classifiers. Complex models often have superior performance to simple ones but are computationally too expensive for many applications. At the same time the performance of simple classifiers is not necessarily limited only by their flexibility but also by the amount of labelled data available for training. We propose a semi-supervised wrapper algorithm named apprenticeship learning, which leverages the strength of slow but powerful classification methods to improve the performance of simpler methods. The powerful classifier parses a large pool of unlabelled data, labelling positive examples to extend the dataset of the simple classifier. We demonstrate apprenticeship learning and its effectiveness by performing experiments on the VOC2007 dataset - one experiment improving detection performance on VOC2007, and one domain adaptation experiment, where the VOC2007 classifier is adapted to a new dataset, collected using a GoPro camera.

Miroslav Kobetski, Josephine Sullivan
Adding Discriminative Power to Hierarchical Compositional Models for Object Class Detection

In recent years, hierarchical compositional models have been shown to possess many appealing properties for the object class detection such as coping with potentially large number of object categories. The reason is that they encode categories by hierarchical vocabularies of parts which are shared among the categories. On the downside, the sharing and purely reconstructive nature causes problems when categorizing visually-similar categories and separating them from the background. In this paper we propose a novel approach that preserves the appealing properties of the generative hierarchical models, while at the same time improves their discrimination properties. We achieve this by introducing a network of discriminative nodes on top of the existing generative hierarchy. The discriminative nodes are sparse linear combinations of activated generative parts. We show in the experiments that the discriminative nodes consistently improve a state-of-the-art hierarchical compositional model. Results show that our approach considers only a fraction of all nodes in the vocabulary (less than 10%) which also makes the system computationally efficient.

Matej Kristan, Marko Boben, Domen Tabernik, Ales Leonardis

Matching, Registration and Alignment

Learning Multi-view Correspondences via Subspace-Based Temporal Coincidences

In this work we present an approach to automatically learn pixel correspondences between pairs of cameras. We build on the method of

Temporal Coincidence Analysis

(TCA) and extend it from the pure temporal (i.e. single-pixel) to the spatiotemporal domain. Our approach is based on learning a statistical model for local spatiotemporal image patches, determining rare, and expressive events from this model, and matching these events across multiple views. Accumulating multi-image coincidences of such events over time allows to learn the desired geometric and photometric relations. The presented method also works for strongly different viewpoints and camera settings, including substantial rotation, and translation. The only assumption that is made is that the relative orientation of pairs of cameras may be arbitrary, but fixed, and that the observed scene shows visual activity. We show that the proposed method outperforms the single pixel approach to TCA both in terms of learning speed and accuracy.

Christian Conrad, Rudolf Mester
Interest Region Description Using Local Binary Pattern of Gradients

Multispectral imaging system maps the contents of a scene to different intensity levels with in spectral images. This imaging process induces spectral variations among the different wavelength band images of the same scene and results in uncorrelated interest region descriptors for cross spectral image matching. This paper presents Local Binary Pattern of Gradients (LBPG) to improve the strength of interest region description under such spectral variations. In LBPG the image gradients are first transformed into binary patterns and then the gradient patterns are used instead of raw gradients for interest region description. We validate the LBPG approach on the spectral images of six different indoor and outdoor scenes. The experimental results confirm better cross spectral image matching performance as compared to SIFT and Center Symmetric Local Binary Patterns.

Sajid Saleem, Robert Sablatnig
Probabilistic Hough Voting for Attitude Estimation from Aerial Fisheye Images

For navigation of unmanned aerial vehicles (UAVs), attitude estimation is essential. We present a method for attitude estimation (pitch and roll angle) from aerial fisheye images through horizon detection. The method is based on edge detection and a probabilistic Hough voting scheme. In a flight scenario, there is often some prior knowledge of the vehicle altitude and attitude. We exploit this prior to make the attitude estimation more robust by letting the edge pixel votes be weighted based on the probability distributions for the altitude and pitch and roll angles. The method does not require any sky/ground segmentation as most horizon detection methods do. Our method has been evaluated on aerial fisheye images from the internet. The horizon is robustly detected in all tested images. The deviation in the attitude estimate between our automated horizon detection and a manual detection is less than 1°.

Bertil Grelsson, Michael Felsberg
Automatic Optimization of Alignment Parameters for Tomography Datasets

As tomographic imaging is being performed at increasingly smaller scales, the stability of the scanning hardware is of great importance to the quality of the reconstructed image. Instabilities lead to perturbations in the geometrical parameters used in the acquisition of the projections. In particular for electron tomography and high-resolution X-ray tomography, small instabilities in the imaging setup can lead to severe artifacts. We present a novel alignment algorithm for recovering the true geometrical parameters

after

the object has been scanned, based on measured data. Our algorithm employs an optimization algorithm that combines alignment with reconstruction. We demonstrate that problem-specific design choices made in the implementation are vital to the success of the method. The algorithm is tested in a set of simulation experiments. Our experimental results indicate that the method is capable of aligning tomography datasets with considerably higher accuracy compared to standard cross-correlation methods.

Folkert Bleichrodt, K. Joost Batenburg
Least-Squares Transformations between Point-Sets

This paper derives formulas for least-squares transformations between point-sets in ℝ

d

. We consider affine transformations, with optional constraints for linearity, scaling, and orientation. We base the derivations hierarchically on reductions, and use trace manipulation to achieve short derivations. For the unconstrained problems, we provide a new formula which maximizes the orthogonality of the transform matrix.

Kalle Rutanen, Germán Gómez-Herrero, Sirkka-Liisa Eriksson, Karen Egiazarian
3D Object Pose Estimation Using Viewpoint Generative Learning

Conventional local features such as SIFT or SURF are robust to scale and rotation changes but sensitive to large perspective change. Because perspective change always occurs when 3D object moves, using these features to estimate the pose of a 3D object is a challenging task. In this paper, we extend one of our previous works on viewpoint generative learning to 3D objects. Given a model of a textured object, we virtually generate several patterns of the model from different viewpoints and select stable keypoints from those patterns. Then our system learns a collection of feature descriptors from the stable keypoints. Finally, we are able to estimate the pose of a 3D object by using these robust features. In our experimental results, we demonstrate that our system is robust against large viewpoint change and even under partial occlusion.

Dissaphong Thachasongtham, Takumi Yoshida, François de Sorbier, Hideo Saito

3D Vision

Structure from Motion Estimation with Positional Cues

We present a system for structure from motion estimation using additional positioning data such as GPS data. The system incorporates the additional data in the outlier detection, the initial estimates and the final bundle adjustment. The initial solution is based on a novel objective function which is solved using convex optimization. This objective function is also used for outlier detection and removal. The initial solution is then refined based on a novel near

L

2

minimization of the reprojection error using convex optimization methods. We present results on synthetic and real data, that shows the robustness, accuracy and speed of the proposed method.

Linus Svärm, Magnus Oskarsson
Probabilistic Range Image Integration for DSM and True-Orthophoto Generation

Typical photogrammetric processing pipelines for digital surface model (DSM) generation perform aerial triangulation, dense image matching and a fusion step to integrate multiple depth estimates into a consistent 2.5D surface model. The integration is strongly influenced by the quality of the individual depth estimates, which need to be handled robustly. We propose a probabilistically motivated 3D filtering scheme for range image integration. Our approach avoids a discrete voxel sampling, is memory efficient and can easily be parallelized. Neighborhood information given by a Delaunay triangulation can be exploited for photometric refinement of the fused DSMs before rendering true-orthophotos from the obtained models. We compare our range image fusion approach quantitatively on ground truth data by a comparison with standard median fusion. We show that our approach can handle a large amount of outliers very robustly and is able to produce improved DSMs and true-orthophotos in a qualitative comparison with current state-of-the-art commercial aerial image processing software.

Markus Rumpler, Andreas Wendel, Horst Bischof
On-line Stereo Self-calibration through Minimization of Matching Costs

This paper presents an approach to the problem of on-line stereo self-calibration. After a short introduction of the general method, we propose a new one, based on the minimization of matching costs. We furthermore show that the number of matched pixels can be used as a quality measure. A Metropolis algorithm based Monte-Carlo scheme is employed to reliably minimize the costs. We present experimental results in the context of automotive stereo with different matching algorithms. These show the effectiveness for the calibration of roll and pitch angle offsets.

Robert Spangenberg, Tobias Langner, Raúl Rojas
Depth Map Inpainting under a Second-Order Smoothness Prior

Many 3D reconstruction methods produce incomplete depth maps. Depth map inpainting can generate visually plausible structures for the missing areas. We present an inpainting method that encourages flat surfaces without favouring fronto-parallel planes. Moreover, it uses a color image to guide the inpainting and align color and depth edges. We implement the algorithm efficiently through graph-cuts. We compare the performance of our method with another inpainting approach used for large datasets and we show the results using several datasets. The depths inpainted with our method are visually plausible and of higher quality.

Daniel Herrera C., Juho Kannala, L’ubor Ladický, Janne Heikkilä
Merging Overlapping Depth Maps into a Nonredundant Point Cloud

Combining long sequences of overlapping depth maps without simplification results in a huge number of redundant points, which slows down further processing. In this paper, a novel method is presented for incrementally creating a nonredundant point cloud with varying levels of detail without limiting the captured volume or requiring any parameters from the user. Overlapping measurements are used to refine point estimates by reducing their directional variance. The algorithm was evaluated with plane and cube fitting residuals, which were improved considerably over redundant point clouds.

Tomi Kyöstilä, Daniel Herrera C., Juho Kannala, Janne Heikkilä
Industrial Phase-Shifting Profilometry in Motion

Phase-Shift Profilometry (PSP) provides a means for dense high-quality surface scanning. However it imposes a staticity constraint: The scene is required to remain still during the acquisition of multiple images. PSP is also not applicable to dynamic scenes. On the other hand, there exist active stereo techniques which overcome these constraints but impose other limitations, for instance on the surface’s continuity or texture, or by significantly reducing the reconstruction’s resolution.

We present a novel approach to recover reconstructions as dense and almost as accurate as PSP but which allows for a translational object/scene motion during the acquisition of multiple input frames, study its performance in simulations, and present real data results.

P. Schroeder, R. Roux, J. -M. Favreau, M. Perriollat, A. Bartoli

Color and Multispectral Image Analysis

Asymmetry as a Measure of Visual Saliency

A salient feature is a part of the scene that stands out relative to neighboring items. By that we mean that a human observer would experience a salient feature as being more prominent. It is, however, important to quantify saliency in terms of a mathematical quantity that lends itself to measurements. Different metrics have been shown to correlate with human fixations data. These include contrast, brightness and orienting gradients calculated at different image scales.

In this paper, we show that these metrics can be grouped under transformations pertaining to the dihedral group

D

4

, which is the symmetry group of the square image grid. Our results show that salient features can be defined as the image features that are most asymmetric in their surrounds.

Ali Alsam, Puneet Sharma, Anette Wrålsen
Detection of Small Roof Details in Image Sequences

Detecting smaller elevated objects, like chimneys, in high resolution images has several important applications, such as collision warning. On the other hand, the already existing 3D models (that already include the terrain, buildings and vegetation) can be enriched by new instances. There are not many contributions about extracting fine roof details in the literature. Therefore, we developed a new, modularized algorithm for detecting these details as hot spots in the local elevation maps; such a map is typically obtained by a multi-view dense matching method. We use explicit and implicit assumptions on data in order to tighten the search range for chimneys and reduce the number of false alarms. Finally, filtering hot spots by means of color or intensity images takes place. Thus, good detection rates can be achieved for a data set consisting of several high resolution images taken over a residential area.

Dimitri Bulatov, Melanie Pohl
Supervised Object Class Colour Normalisation

Colour is an important cue in many applications of computer vision and image processing, but robust usage often requires estimation of the unknown illuminant colour. Usually, to obtain images invariant to the illumination conditions under which they were taken, color normalisation is used. In this work, we develop a such colour normalisation technique, where true colours are not important per se but where examples of same classes have photometrically consistent appearance. This is achieved by supervised estimation of a class specific canonical colour space where the examples have minimal variation in their colours. We demonstrate the effectiveness of our method with qualitative and quantitative examples from the Caltech-101 data set and a real application of 3D pose estimation for robot grasping.

Ekaterina Riabchenko, Jukka Lankinen, Anders Glent Buch, Joni-Kristian Kämäräinen, Norbert Krüger
Statistical Quality Assessment of Pre-fried Carrots Using Multispectral Imaging

Multispectral imaging is increasingly being used for quality assessment of food items due to its non-invasive benefits. In this paper, we investigate the use of multispectral images of pre-fried carrots, to detect changes over a period of 14 days. The idea is to distinguish changes in quality from spectral images of visible and NIR bands. High dimensional feature vectors were formed from all possible ratios of spectral bands in 9 different percentiles per piece of carrot. We propose to use a multiple hypothesis testing technique based on the Benjamini-Hachberg (BH) method to distinguish possible significant changes in features during the inspection days. Discrimination by the SVM classifier supported these results. Additionally, 2-sided t-tests on the predictions of the elastic-net regressions were carried out to compare our results with previous studies on fried carrots. The experimental results showed that the most significant changes occured in day 2 and day 14.

Sara Sharifzadeh, Line H. Clemmensen, Hanne Løje, Bjarne K. Ersbøll

Motion Analysis

A Feature-Based Adaptive Model for Realtime Face Tracking on Smart Phones

We propose a face tracking method that is extremely fast and applicable to implementation on smart phones. An adaptive model is built to make the tracking robust with variety of situation such as scaling, pose changes, and abrupt movements. For dealing with the scaling and appearance changes, we incorporate the Lukas and Kanade’s optical flow and the CAMShift in which both color and corner point features are used to achieve the high accuracy. Based on the feature tracking, the model adapts with abrupt movements of tracked face and mobile camera by a failure detection method. The system then utilizes the particle filter and CAMShift to catch up the fast motion. Based on the tracked region, we detect the face in order to reinitialize the tracking process. The proposed method shows the high performance against the recent ones as well as achieving the realtime requirement on smart phones.

Quang Nhat Vo, GueeSang Lee
Classification of RGB-D and Motion Capture Sequences Using Extreme Learning Machine

In this paper we present a robust motion recognition framework for both motion capture and RGB-D sensor data. We extract four different types of features and apply a temporal difference operation to form the final feature vector for each frame in the motion sequences. The frames are classified with the extreme learning machine, and the final class of an action is obtained by majority voting. We test our framework with both motion capture and Kinect data and compare the results of different features. The experiments show that our approach can accurately classify actions with both sources of data. For 40 actions of motion capture data, we achieve 92.7% classification accuracy with real-time performance.

Xi Chen, Markus Koskela
Robust Scale-Adaptive Mean-Shift for Tracking

Mean-Shift tracking is a popular algorithm for object tracking since it is easy to implement and it is fast and robust. In this paper, we address the problem of scale adaptation of the Hellinger distance based Mean-Shift tracker.

We start from a theoretical derivation of scale estimation in the Mean-Shift framework. To make the scale estimation robust and suitable for tracking, we introduce regularization terms that counter two major problem: (i) scale expansion caused by background clutter and (ii) scale implosion on self-similar objects. To further robustify the scale estimate, it is validated by a forward-backward consistency check.

The proposed Mean-shift tracker with scale selection is compared with recent state-of-the-art algorithms on a dataset of 48 public color sequences and it achieved excellent results.

Tomas Vojir, Jana Noskova, Jiri Matas

Systems and Applications

Airborne Based High Performance Crowd Monitoring for Security Applications

Crowd monitoring in mass events is a highly important technology to support the security of attending persons. Proposed methods based on terrestrial or airborne image/video data often fail in achieving sufficiently accurate results to guarantee a robust service. We present a novel framework for estimating human density and motion from video data based on custom tailored object detection techniques, a regression based density estimate and a total variation based optical flow extraction. From the gathered features we present a detailed accuracy analysis versus ground truth information. In addition, all information is projected into world coordinates to enable a direct integration with existing geo-information systems. The resulting human counts demonstrate a mean error of 4% to 9% and thus represent a most efficient measure that can be robustly applied in security critical services.

Roland Perko, Thomas Schnabel, Gerald Fritz, Alexander Almer, Lucas Paletta
Characterizing Spatters in Laser Welding of Thick Steel Using Motion Flow Analysis

Laser welding has become a very important method for industrial manufacturing. Despite of the inherent accuracy of laser welding, the resulting weld quality may still be affected by many dynamic conditions related to the operating parameters and to the properties of the welded material. Methods for monitoring the laser welding process are therefore needed to guarantee consistent manufacturing quality. In this paper, we present a method for characterizing spatters in laser welding of thick steel. Pre-processing and edge detection steps of the proposed algorithm are performed on-line with a very high speed by using a dedicated KOVA1 massively parallel image processing chip, and the actual characterization of the spatters is carried out off-line in Matlab. The methods proposed are simple and efficient, thus also facilitating possible integration of the whole algorithm for on-line processing.

Olli Lahdenoja, Tero Säntti, Jonne Poikonen, Mika Laiho, Ari Paasio

Human-Centered Computing

Perceptually-Inspired Artistic Genre Identification System in Digitized Painting Collections

This paper presents an automatic system for the recognition of artistic genre in digital representations of paintings. This solution comes as part of the recent extensive effort of developing image processing solutions that facilitate a better understanding of art. As art addresses human perception, the current extracted features are perceptually inspired. While 3D Color Histogram and Gabor Filter Energy have been used for art description, frameworks extracted using anchoring theory are novel in this field. The paper investigates the possible use of 7 classifiers and the resulting performance, as evaluated on a database containing more than 3400 paintings from 6 different genres, outperforms the reported state of the art.

Razvan George Condorovici, Corneliu Florea, Ruxandra Vrânceanu, Constantin Vertan

Video and Multimedia Analysis

High Capacity Reversible Watermarking for Images Based on Classified Neural Network

Reversible watermarking is a useful technique for some applications requiring high image quality because it can restore what the original images are as well as protect them. In this paper, a high capacity image reversible watermarking is proposed based on classified neural network. According to the variance of surrounding pixel values, all pixel cells are classified as smooth part or rough part. Correspondingly, two neural networks are designed for smooth pixel prediction and rough pixel prediction, respectively. The watermark is embedded in the prediction errors. In addition, a retesting strategy utilizing the parity detection is presented to increase the capacity of the algorithm. Experimental results show that this algorithm can get smaller prediction error and obtain both higher capacity and good visual quality.

Rongrong Ni, H. D. Cheng, Yao Zhao, Yu Hou

Other

Saliency Detection Using Joint Temporal and Spatial Decorrelation

This article presents a scene-driven (i.e. bottom-up) visual saliency detection technique for videos. The proposed method utilizes non-negative matrix factorization (NMF) to replicate neural responses of primary visual cortex neurons in spatial domain. In temporal domain, principal component analysis (PCA) was applied to imitate the effect of stimulus change experience during neural adaptation phenomena. We apply the proposed saliency model to background subtraction problem. The proposed method does not rely on any background model and is purely unsupervised. In experimental results, it will be shown that the proposed method competes well with some of the state-of-the-art background subtraction techniques especially in dynamic scenes.

Hamed Rezazadegan Tavakoli, Esa Rahtu, Janne Heikkilä
Density Driven Diffusion

In this work we derive a novel density driven diffusion scheme for image enhancement. Our approach, called D3, is a semi-local method that uses an initial structure-preserving oversegmentation step of the input image. Because of this, each segment will approximately conform to a homogeneous region in the image, allowing us to easily estimate parameters of the underlying stochastic process thus achieving adaptive non-linear filtering. Our method is capable of producing competitive results when compared to state-of-the-art methods such as non-local means, BM3D and tensor driven diffusion on both color and grayscale images.

Freddie Åström, Vasileios Zografos, Michael Felsberg
Backmatter
Metadaten
Titel
Image Analysis
herausgegeben von
Joni-Kristian Kämäräinen
Markus Koskela
Copyright-Jahr
2013
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-38886-6
Print ISBN
978-3-642-38885-9
DOI
https://doi.org/10.1007/978-3-642-38886-6

Premium Partner