Skip to main content

About this book

This volume contains the papers presented at the Scandinavian Conference on Image Analysis, SCIA 2009, which was held at the Radisson SAS Scandinavian Hotel, Oslo, Norway, June 15–18. SCIA 2009 was the 16th in the biennial series of conferences, which has been organized in turn by the Scandinavian countries Sweden, Finland, D- mark and Norway since 1980. The event itself has always attracted participants and author contributions from outside the Scandinavian countries, making it an international conference. Theconferenceincludedafulldayoftutorialsand?vekeynotetalksprovided by world-renowned experts. The program covered high-quality scienti?c cont- butions within image analysis, human and action analysis, pattern and object recognition,colorimagingandquality,medicalandbiomedicalapplications,face andheadanalysis,computer vision,andmultispectralcoloranalysis. The papers werecarefully selected based on at least two reviews. Among 154 submissions 79 wereaccepted,leadingtoanacceptancerateof51%. SinceSCIAwasarrangedas a single-track event, 30 papers were presented in the oral sessions and 49 papers were presented in the poster sessions. A separate session on multispectral color science was organized in cooperation with the 11th Symposium of Multispectral Color Science (MCS 2009). Since 2009 was proclaimed the “International Year of Astronomy” by the United Nations General Assembly, the conference also contained a session on the topic “Imageand PatternAnalysis in Astronomyand Astrophysics. ” SCIA has a reputation of having a friendly environment, in addition to hi- quality scienti?c contributions. We focused on maintaining this reputation, by designing a technical and social program that we hope the participants found interesting and inspiring for new research ideas and network extensions. We thank the authors for submitting their valuable work to SCIA.

Table of Contents


Human Motion and Action Analysis

Instant Action Recognition

In this paper, we present an efficient system for action recognition from very short sequences. For action recognition typically appearance and/or motion information of an action is analyzed using a large number of frames. This is a limitation if very fast actions (e.g., in sport analysis) have to be analyzed. To overcome this limitation, we propose a method that uses a single-frame representation for actions based on appearance and motion information. In particular, we estimate Histograms of Oriented Gradients (HOGs) for the current frame as well as for the corresponding dense flow field. The thus obtained descriptors are efficiently represented by the coefficients of a Non-negative Matrix Factorization (NMF). Actions are classified using an one-vs-all Support Vector Machine. Since the flow can be estimated from two frames, in the evaluation stage only two consecutive frames are required for the action analysis. Both, the optical flow as well as the HOGs, can be computed very efficiently. In the experiments, we compare the proposed approach to state-of-the-art methods and show that it yields competitive results. In addition, we demonstrate action recognition for real-world beach-volleyball sequences.

Thomas Mauthner, Peter M. Roth, Horst Bischof

Using Hierarchical Models for 3D Human Body-Part Tracking

Human body pose estimation and tracking is a challenging task mainly because of the high dimensionality of the human body model. In this paper we introduce a Hierarchical Annealing Particle Filter (H-APF) algorithm for 3D articulated human body-part tracking. The method exploits Hierarchical Human Body Model (HHBM) in order to perform accurate body pose estimation. The method applies nonlinear dimensionality reduction combined with the dynamic motion model and the hierarchical body model. The dynamic motion model allows to make a better pose prediction, while the hierarchical model of the human body expresses conditional dependencies between the body parts and also allows us to capture properties of separate parts. The improved annealing approach is used for the propagation between different body models and sequential frames. The algorithm was checked on HumanEvaI and HumanEvaII datasets, as well as on other videos and proved to be effective and robust and was shown to be capable of performing an accurate and robust tracking. The comparison to other methods and the error calculations are provided.

Leonid Raskin, Michael Rudzsky, Ehud Rivlin

Analyzing Gait Using a Time-of-Flight Camera

An algorithm is created, which performs human gait analysis using spatial data and amplitude images from a


camera. For each frame in a sequence the camera supplies cartesian coordinates in space for every pixel. By using an articulated model the subject pose is estimated in the depth map in each frame. The pose estimation is based on likelihood, contrast in the amplitude image, smoothness and a shape prior used to solve a Markov random field. Based on the pose estimates, and the prior that movement is locally smooth, a sequential model is created, and a gait analysis is done on this model. The output data are: Speed, Cadence (steps per minute), Step length, Stride length (stride being two consecutive steps also known as a gait cycle), and Range of motion (angles of joints). The created system produces good output data of the described output parameters and requires no user interaction.

Rasmus R. Jensen, Rasmus R. Paulsen, Rasmus Larsen

Primitive Based Action Representation and Recognition

There has been a recent interest in segmenting action sequences into meaningful parts (action primitives) and to model actions on a higher level based on these action primitives. Unlike previous works where action primitives are defined a-priori and search is made for them later, we present a sequential and statistical learning algorithm for automatic detection of the action primitives and the action grammar based on these primitives. We model a set of actions using a single HMM whose structure is learned incrementally as we observe new types. Actions are modeled with sufficient number of Gaussians which would become the states of an HMM for an action. For different actions we find the states that are common in the actions which are then treated as an action primitive.

Sanmohan, Volker Krüger

Object and Pattern Recognition

Recognition of Protruding Objects in Highly Structured Surroundings by Structural Inference

Recognition of objects in highly structured surroundings is a challenging task, because the appearance of target objects changes due to fluctuations in their surroundings. This makes the problem highly context dependent. Due to the lack of knowledge about the target class, we also encounter a difficulty delimiting the non-target class. Hence, objects can neither be recognized by their similarity to prototypes of the target class, nor by their similarity to the non-target class. We solve this problem by introducing a transformation that will eliminate the objects from the structured surroundings. Now, the dissimilarity between an object and its surrounding (non-target class) is inferred from the difference between the local image before and after transformation. This forms the basis of the detection and classification of polyps in computed tomography colonography. 95% of the polyps are detected at the expense of four false positives per scan.

Vincent F. van Ravesteijn, Frans M. Vos, Lucas J. van Vliet

A Binarization Algorithm Based on Shade-Planes for Road Marking Recognition

A binarization algorithm tolerant to both gradual change of intensity caused by shade and the discontinuous changes caused by shadows is described in this paper. This algorithm is based on “shade-planes”, in which intensity changes gradually and no edges are included. These shade-planes are produced by selecting a “principal-intensity” in each small block by a quasi-optimization algorithm. One shade-plane is then selected as the background to eliminate the gradual change in the input image. Consequently, the image, with its gradual change removed, is binarized by a conventional global thresholding algorithm. The binarized image is provided to a road marking recognition system, for which influence of shade and shadows is inevitable in the sunlight.

Tomohisa Suzuki, Naoaki Kodaira, Hiroyuki Mizutani, Hiroaki Nakai, Yasuo Shinohara

Rotation Invariant Image Description with Local Binary Pattern Histogram Fourier Features

In this paper, we propose Local Binary Pattern Histogram Fourier features (LBP-HF), a novel rotation invariant image descriptor computed from discrete Fourier transforms of local binary pattern (LBP) histograms. Unlike most other histogram based invariant texture descriptors which normalize rotation locally, the proposed invariants are constructed globally for the whole region to be described. In addition to being rotation invariant, the LBP-HF features retain the highly discriminative nature of LBP histograms. In the experiments, it is shown that these features outperform non-invariant and earlier version of rotation invariant LBP and the MR8 descriptor in texture classification, material categorization and face recognition tests.

Timo Ahonen, Jiří Matas, Chu He, Matti Pietikäinen

Weighted DFT Based Blur Invariants for Pattern Recognition

Recognition of patterns in blurred images can be achieved without deblurring of the images by using image features that are invariant to blur. All known blur invariants are based either on image moments or Fourier phase. In this paper, we introduce a method that improves the results obtained by existing state of the art blur invariant Fourier domain features. In this method, the invariants are weighted according to their reliability, which is proportional to their estimated signal-to-noise ratio. Because the invariants are non-linear functions of the image data, we apply a linearization scheme to estimate their noise covariance matrix, which is used for computation of the weighted distance between the images in classification. We applied similar weighting scheme to blur and blur-translation invariant features in the Fourier domain. For illustration we did experiments also with other Fourier and spatial domain features with and without weighting. In the experiments, the classification accuracy of the Fourier domain invariants was increased by up to 20 % through the use of weighting.

Ville Ojansivu, Janne Heikkilä

Color Imaging and Quality

The Effect of Motion Blur and Signal Noise on Image Quality in Low Light Imaging

Motion blur and signal noise are probably the two most dominant sources of image quality degradation in digital imaging. In low light conditions, the image quality is always a tradeoff between motion blur and noise. Long exposure time is required in low illumination level in order to obtain adequate signal to noise ratio. On the other hand, risk of motion blur due to tremble of hands or subject motion increases as exposure time becomes longer. Loss of image brightness caused by shorter exposure time and consequent underexposure can be compensated with analogue or digital gains. However, at the same time also noise will be amplified. In relation to digital photography the interesting question is: What is the tradeoff between motion blur and noise that is preferred by human observers? In this paper we explore this problem. A motion blur metric is created and analyzed. Similarly, necessary measurement methods for image noise are presented. Based on a relatively large testing material, we show experimental results on the motion blur and noise behavior in different illumination conditions and their effect on the perceived image quality.

Eero Kurimo, Leena Lepistö, Jarno Nikkanen, Juuso Grén, Iivari Kunttu, Jorma Laaksonen

A Hybrid Image Quality Measure for Automatic Image Quality Assessment

Automatic image quality assessment has many diverse applications. Existing quality measures are not accurate representatives of the human perception. We present a hybrid image quality (HIQ) measure, which is a combination of four existing measures using an ‘n’ degree polynomial to accurately model the human image perception. First we undertook time consuming human experiments to subjectively evaluate a given set of training images, and resultantly formed a Human Perception Curve (HPC). Next we define a HIQ measure that closely follows the HPC using curve fitting techniques. The HIQ measure is then validated on a separate set of images by similar human subjective experiments and is compared to the HPC.The coefficients and degree of the polynomial are estimated using regression on training data obtained from human subjects. Validation of the resultant HIQ was performed on a separate validation data. Our results show that HIQ gives an RMS error of 5.1 compared to the best RMS error of 5.8 by a second degree polynomial of an individual measure HVS (Human Visual System) absolute norm (



) amongst the four considered metrics. Our data contains subjective quality assessment (by 100 individuals) of 174 images with various degrees of fast fading distortion. Each image was evaluated by 50 different human subjects using double stimulus quality scale, resulting in an overall 8,700 judgements.

Atif Bin Mansoor, Maaz Haider, Ajmal S. Mian, Shoab A. Khan

Framework for Applying Full Reference Digital Image Quality Measures to Printed Images

Measuring visual quality of printed media is important as printed products play an essential role in every day life, and for many “vision applications”, printed products still dominate the market (e.g., newspapers). Measuring visual quality, especially the quality of images when the original is known (full-reference), has been an active research topic in image processing. During the course of work, several good measures have been proposed and shown to correspond with human (subjective) evaluations. Adapting these approaches to measuring visual quality of printed media has been considered only rarely and is not straightforward. In this work, the aim is to reduce the gap by presenting a complete framework starting from the original digital image and its hard-copy reproduction to a scanned digital sample which is compared to the original reference image by using existing quality measures. The proposed framework is justified by experiments where the measures are compared to a subjective evaluation performed using the printed hard copies.

Tuomas Eerola, Joni-Kristian Kämäräinen, Lasse Lensu, Heikki Kälviäinen

Colour Gamut Mapping as a Constrained Variational Problem

We present a novel, computationally efficient, iterative, spatial gamut mapping algorithm. The proposed algorithm offers a compromise between the colorimetrically optimal gamut clipping and the most successful spatial methods. This is achieved by the iterative nature of the method. At iteration level zero, the result is identical to gamut clipping. The more we iterate the more we approach an optimal, spatial, gamut mapping result. Optimal is defined as a gamut mapping algorithm that preserves the hue of the image colours as well as the spatial ratios at all scales. Our results show that as few as five iterations are sufficient to produce an output that is as good or better than that achieved in previous, computationally more expensive, methods. Being able to improve upon previous results using such low number of iterations allows us to state that the proposed algorithm is






being the number of pixels. Results based on a challenging small destination gamut supports our claims that it is indeed efficient.

Ali Alsam, Ivar Farup

Multispectral Color Science

Geometric Multispectral Camera Calibration

A large number of multispectral cameras uses optical bandpass filters to divide the electromagnetic spectrum into passbands. If the filters are placed between the sensor and the lens, the different thicknesses, refraction indices and tilt angles of the filters cause image distortions, which are different for each spectral passband. On the other hand, the lens also causes distortions which are critical in machine vision tasks. In this paper, we propose a method to calibrate the multispectral camera geometrically to remove all kinds of geometric distortions. To this end, the combination of the camera with each of the bandpass filters is considered as single camera system. The systems are then calibrated by estimation of the intrinsic and extrinsic camera parameters and geometrically merged via a homography. The experimental results show that our algorithm can be used to compensate for the geometric distortions of the lens and the optical bandpass filters simultaneously.

Johannes Brauers, Til Aach

A Color Management Process for Real Time Color Reconstruction of Multispectral Images

We introduce a new accurate and technology independent display color characterization model for color rendering of multispectral images. The establishment of this model is automatic, and does not exceed the time of a coffee break to be efficient in a practical situation. This model is a part of the color management workflow of the new tools designed at the C2RMF for multispectral image analysis of paintings acquired with the material developed during the CRISATEL European project. The analysis is based on color reconstruction with virtual illuminants and use a GPU (Graphics processor unit) based processing model in order to interact in real time with a virtual lighting.

Philippe Colantoni, Jean-Baptiste Thomas

Precise Analysis of Spectral Reflectance Properties of Cosmetic Foundation

The present paper describes the detailed analysis of the spectral reflection properties of skin surface with make-up foundation, based on two approaches of a physical model using the Cook-Torrance model and a statistical approach using the PCA. First, we show how the surface-spectral reflectances changed with the observation conditions of light incidence and viewing, and also the material compositions. Second, the Cook-Torrance model is used for describing the complicated reflectance curves by a small number of parameters, and rendering images of 3D object surfaces. Third, the PCA method is presented the observed spectral reflectances analysis. The PCA shows that all skin surfaces have the property of the standard dichromatic reflection, so that the observed reflectances are represented by two components of the diffuse reflectance and a constant reflectance. The spectral estimation is then reduced to a simple computation using the diffuse reflectance, some principal components, and the weighting coefficients. Finally, the feasibility of the two methods is examined in experiments. The PCA method performs reliable spectral reflectance estimation for the skin surface from a global point of view, compared with the model-based method.

Yusuke Moriuchi, Shoji Tominaga, Takahiko Horiuchi

Extending Diabetic Retinopathy Imaging from Color to Spectra

In this study, spectral images of 66 human retinas were collected. These spectral images were measured in vivo from 54 voluntary diabetic patients and 12 control subjects using a modified ophthalmic fundus camera system. This system incorporates the optics of a standard fundus microscope, 30 narrow bandpass interference filters ranging from 400 to 700 nanometers at 10 nm intervals, a steady-state broadband lightsource and a monochrome digital charge-coupled device camera. The introduced spectral fundus image database will be expanded in the future with professional annotations and will be made public.

Pauli Fält, Jouni Hiltunen, Markku Hauta-Kasari, Iiris Sorri, Valentina Kalesnykiene, Hannu Uusitalo

Medical and Biomedical Applications

Fast Prototype Based Noise Reduction

This paper introduces a novel method for noise reduction in medical images based on concepts of the Non-Local Means algorithm. The main objective has been to develop a method that optimizes the processing speed to achieve practical applicability without compromising the quality of the resulting images. A database consisting of prototypes, composed of pixel neighborhoods originating from several images of similar motif, has been created. By using a dedicated data structure, here Locality Sensitive Hashing (LSH), fast access to appropriate prototypes is granted. Experimental results show that the proposed method can be used to provide noise reduction with high quality results in a fraction of the time required by the Non-local Means algorithm.

Kajsa Tibell, Hagen Spies, Magnus Borga

Towards Automated TEM for Virus Diagnostics: Segmentation of Grid Squares and Detection of Regions of Interest

When searching for viruses in an electron microscope the sample grid constitutes an enormous search area. Here, we present methods for automating the image acquisition process for an automatic virus diagnostic application. The methods constitute a multi resolution approach where we first identify the grid squares and rate individual grid squares based on content in a grid overview image and then detect regions of interest in higher resolution images of good grid squares. Our methods are designed to mimic the actions of a virus TEM expert manually navigating the microscope and they are also compared to the expert’s performance. Integrating the proposed methods with the microscope would reduce the search area by more than 99.99 % and it would also remove the need for an expert to perform the virus search by the microscope.

Gustaf Kylberg, Ida-Maria Sintorn, Gunilla Borgefors

Unsupervised Assessment of Subcutaneous and Visceral Fat by MRI

This paper presents a method for unsupervised assessment of visceral and subcutaneous adipose tissue in the abdominal region by MRI. The identification of the subcutaneous and the visceral regions were achieved by dynamic programming constrained by points acquired from an active shape model. The combination of active shape models and dynamic programming provides for a both robust and accurate segmentation. The method features a low number of parameters that give good results over a wide range of values.The unsupervised segmentation was compared with a manual procedure and the correlation between the manual segmentation and unsupervised segmentation was considered high.

Peter S. Jørgensen, Rasmus Larsen, Kristian Wraae

Image and Pattern Analysis in Astrophysics and Astronomy

Decomposition and Classification of Spectral Lines in Astronomical Radio Data Cubes

The natural output of imaging spectroscopy in astronomy is a 3D data cube with two spatial and one frequency axis. The spectrum of each image pixel consists of an emission line which is Doppler-shifted by gas motions along the line of sight. These data are essential to understand the gas distribution and kinematics of the astronomical object. We propose a two-step method to extract coherent kinematic structures from the data cube. First, the spectra are decomposed into a sum of Gaussians using a Bayesian method to obtain an estimation of spectral lines. Second, we aim at tracking the estimated lines to get an estimation of the structures in the cube. The performance of the approach is evaluated on a real radio-astronomical observation.

Vincent Mazet, Christophe Collet, Bernd Vollmer

Segmentation, Tracking and Characterization of Solar Features from EIT Solar Corona Images

With the multiplication of sensors and instruments, size, amount and quality of solar image data are constantly increasing, and analyzing this data requires defining and implementing accurate and reliable algorithms. In the context of solar features analysis, it is particularly important to accurately delineate their edges and track their motion, to estimate quantitative indices and analyse their evolution through time. Herein, we introduce an image processing pipeline that segment, track and quantify solar features from a set of multispectral solar corona images, taken with eit EIT instrument. We demonstrate the method on the automatic tracking of Active Regions from EIT images, and on the analysis of the spatial distribution of coronal bright points. The method is generic enough to allow the study of any solar feature, provided it can be segmented from EIT images or other sources.

Vincent Barra, Véronique Delouille, Jean-Francois Hochedez

Galaxy Decomposition in Multispectral Images Using Markov Chain Monte Carlo Algorithms

Astronomers still lack a multiwavelength analysis scheme for galaxy classification. In this paper we propose a way of analysing multispectral observations aiming at refining existing classifications with spectral information. We propose a global approach which consists of decomposing the galaxy into a parametric model using physically meaningful structures. Physical interpretation of the results will be straightforward even if the method is limited to regular galaxies. The proposed approach is fully automatic and performed using Markov Chain Monte Carlo (MCMC) algorithms. Evaluation on simulated and real 5-band images shows that this new method is robust and accurate.

Benjamin Perret, Vincent Mazet, Christophe Collet, Éric Slezak

Face Recognition and Tracking

Head Pose Estimation from Passive Stereo Images

We present an algorithm to estimate the 3D pose (location and orientation) of a previously unseen face from low-quality range images. The algorithm generates many pose candidates from a signature to find the nose tip based on local shape, and then evaluates each candidate by computing an error function. Our algorithm incorporates 2D and 3D cues to make the system robust to low-quality range images acquired by passive stereo systems. It handles large pose variations (of ±90 ° yaw and ±45 ° pitch rotation) and facial variations due to expressions or accessories. For a maximally allowed error of 30°, the system achieves an accuracy of 83.6%.

M. D. Breitenstein, J. Jensen, C. Høilund, T. B. Moeslund, L. Van Gool

Multi-band Gradient Component Pattern (MGCP): A New Statistical Feature for Face Recognition

A feature extraction method using multi-frequency bands is proposed for face recognition, named as the Multi-band Gradient Component Pattern (MGCP). The MGCP captures discriminative information from Gabor filter responses in virtue of an orthogonal gradient component analysis method, which is especially designed to encode energy variations of Gabor magnitude. Different from some well-known Gabor-based feature extraction methods, MGCP extracts geometry features from Gabor magnitudes in the orthogonal gradient space in a novel way. It is shown that such features encapsulate more discriminative information. The proposed method is evaluated by performing face recognition experiments on the FERET and FRGC ver 2.0 databases and compared with several state-of-the-art approaches. Experimental results demonstrate that MGCP achieves the highest recognition rate among all the compared methods, including some well-known Gabor-based methods.

Yimo Guo, Jie Chen, Guoying Zhao, Matti Pietikäinen, Zhengguang Xu

Weight-Based Facial Expression Recognition from Near-Infrared Video Sequences

This paper presents a novel weight-based approach to recognize facial expressions from the near-infrared (NIR) video sequences. Facial expressions can be thought of as specific dynamic textures where local appearance and motion information need to be considered. The face image is divided into several regions from which local binary patterns from three orthogonal planes (LBP-TOP) features are extracted to be used as a facial feature descriptor. The use of LBP-TOP features enables us to set different weights for each of the three planes (appearance, horizontal motion and vertical motion) inside the block volume. The performance of the proposed method is tested in the novel NIR facial expression database. Assigning different weights to the planes according to their contribution improves the performance. NIR images are shown to deal with illumination variations comparing with visible light images.

Matti Taini, Guoying Zhao, Matti Pietikäinen

Stereo Tracking of Faces for Driver Observation

This report contributes a coherent framework for the robust tracking of facial structures. The framework comprises aspects of structure and motion problems, as there are feature extraction, spatial and temporal matching, re-calibration, tracking, and reconstruction. The scene is acquired through a calibrated stereo sensor. A cue processor extracts invariant features in both views, which are spatially matched by geometric relations. The temporal matching takes place via prediction from the tracking module and a similarity transformation of the features’ 2D locations between both views. The head is reconstructed and tracked in 3D. The re-projection of the predicted structure limits the search space of both the cue processor as well as the re-construction procedure. Due to the focused application, the instability of calibration of the stereo sensor is limited to the relative extrinsic parameters that are re-calibrated during the re-construction process. The framework is practically applied and proven. First experimental results will be discussed and further steps of development within the project are presented.

Markus Steffens, Stephan Kieneke, Dominik Aufderheide, Werner Krybus, Christine Kohring, Danny Morton

Computer Vision

Camera Resectioning from a Box

In this paper we describe how we can do camera resectioning from a box with unknown dimensions, i.e. determine the camera model, assuming that image pixels are square. This assumption is equivalent to assuming that the camera has an aspect ratio of one and zero skew, and this holds for most — if not all — digital cameras. Our proposed method works by first deriving 9 linear constraints on the projective camera matrix from the box, leaving a 3-dimensional subspace in which the projective camera matrix can lie. A single solution in this 3D subspace is then found via a method by Triggs in 1999, which uses the square pixel assumption to set up a 4th degree polynomial to which the solution is the desired model. This approach is, however, numerically challenging, and we use several means to tackle this issue. Lastly the solution is refined in an iterative manner, i.e. using bundle adjustment.

Henrik Aanæs, Klas Josephson, François Anton, Jakob Andreas Bærentzen, Fredrik Kahl

Appearance Based Extraction of Planar Structure in Monocular SLAM

This paper concerns the building of enhanced scene maps during real-time monocular SLAM. Specifically, we present a novel algorithm for detecting and estimating planar structure in a scene based on both geometric and appearance and information. We adopt a hypothesis testing framework, in which the validity of planar patches within a triangulation of the point based scene map are assessed against an appearance metric. A key contribution is that the metric incorporates the uncertainties available within the SLAM filter through the use of a test statistic assessing error distribution against predicted covariances, hence maintaining a coherent probabilistic formulation. Experimental results indicate that the approach is effective, having good detection and discrimination properties, and leading to convincing planar feature representations.

José Martínez-Carranza, Andrew Calway

A New Triangulation-Based Method for Disparity Estimation in Image Sequences

We give a simple and efficient algorithm for approximating computation of disparities in a pair of rectified frames of an image sequence. The algorithm consists of rendering a sparse set of correspondences, which are triangulated, expanded and corrected in the areas of occlusions and homogeneous texture by a color distribution algorithm. The obtained approximations of the disparity maps are refined by a semi-global algorithm. The algorithm was tested for three data sets with rather different data quality. The results of the performance of our method are presented and areas of applications and future research are outlined.

Dimitri Bulatov, Peter Wernerus, Stefan Lang

Sputnik Tracker: Having a Companion Improves Robustness of the Tracker

Tracked objects rarely move alone. They are often temporarily accompanied by other objects undergoing similar motion. We propose a novel tracking algorithm called Sputnik Tracker. It is capable of identifying which image regions move coherently with the tracked object. This information is used to stabilize tracking in the presence of occlusions or fluctuations in the appearance of the tracked object, without the need to model its dynamics. In addition, Sputnik Tracker is based on a novel template tracker integrating foreground and background appearance cues. The time varying shape of the target is also estimated in each video frame, together with the target position. The time varying shape is used as another cue when estimating the target position in the next frame.

Lukáš Cerman, Jiří Matas, Václav Hlaváč

Poster Session 1

A Convex Approach to Low Rank Matrix Approximation with Missing Data

Many computer vision problems can be formulated as low rank bilinear minimization problems. One reason for the success of these problems is that they can be efficiently solved using singular value decomposition. However this approach fails if the measurement matrix contains missing data.

In this paper we propose a new method for estimating missing data. Our approach is similar to that of



approximation schemes that have been successfully used for recovering sparse solutions of under-determined linear systems. We use the nuclear norm to formulate a convex approximation of the missing data problem. The method has been tested on real and synthetic images with promising results.

Carl Olsson, Magnus Oskarsson

Multi-frequency Phase Unwrapping from Noisy Data: Adaptive Local Maximum Likelihood Approach

The paper introduces a new approach to absolute phase estimation from frequency diverse wrapped observations. We adopt a discontinuity preserving nonparametric regression technique, where the phase is reconstructed based on a local maximum likelihood criterion. It is shown that this criterion, applied to the multifrequency data, besides filtering the noise, yields a 2


-periodic solution, where


 > 1 is an integer. The filtering algorithm is based on local polynomial (LPA) approximation for the design of nonlinear filters (estimators) and the adaptation of these filters to the unknown spatially smoothness of the absolute phase. Depending on the value of


and of the original phase range, we may obtain complete or partial phase unwrapping. In the latter case, we apply the recently introduced robust (in the sense of discontinuity preserving) PUMA unwrapping algorithm [1]. Simulations give evidence that the proposed method yields state-of-the-art performance, enabling phase unwrapping in extraordinary difficult situations when all other algorithms fail.

José Bioucas-Dias, Vladimir Katkovnik, Jaakko Astola, Karen Egiazarian

A New Hybrid DCT and Contourlet Transform Based JPEG Image Steganalysis Technique

In this paper, a universal steganalysis scheme for JPEG images based upon hybrid transform features is presented. We first analyzed two different transform domains (Discrete Cosine Transform and Discrete Contourlet Transform) separately, to extract features for steganalysis. Then a combination of these two feature sets is constructed and employed for steganalysis. A Fisher Linear Discriminant classifier is trained on features from both clean and steganographic images using all three feature sets and subsequently used for classification. Experiments performed on images embedded with two variants of F5 and Model based steganographic techniques reveal the effectiveness of proposed steganalysis approach by demonstrating improved detection for hybrid features.

Zohaib Khan, Atif Bin Mansoor

Improved Statistical Techniques for Multi-part Face Detection and Recognition

In this paper we propose an integrated system for face detection and face recognition based on improved versions of state-of-the-art statistical learning techniques such as Boosting and LDA. Both the detection and the recognition processes are performed on facial features (e.g., the eyes, the nose, the mouth, etc) in order to improve the recognition accuracy and to exploit their statistical independence in the training phase. Experimental results on real images show the superiority of our proposed techniques with respect to the existing ones in both the detection and the recognition phase.

Christian Micheloni, Enver Sangineto, Luigi Cinque, Gian Luca Foresti

Face Recognition under Variant Illumination Using PCA and Wavelets

In this paper, an efficient wavelet subband representation method is proposed for face identification under varying illumination. In our presented method, prior to the traditional principal component analysis (PCA), we use wavelet transform to decompose the image into different frequency subbands, and a low-frequency subband with three secondary high-frequency subbands are used for PCA representations. Our aim is to compensate for the traditional wavelet-based methods by only selecting the most discriminating subband and neglecting the scattered characteristic of discriminating features. The proposed algorithm has been evaluated on the Yale Face Database B. Significant performance gains are attained.

Mong-Shu Lee, Mu-Yen Chen, Fu-Sen Lin

On the Spatial Distribution of Local Non-parametric Facial Shape Descriptors

In this paper we present a method to form pattern specific facial shape descriptors called basis-images for non-parametric LBPs (Local Binary Patterns) and some other similar face descriptors such as Modified Census Transform (MCT) and LGBP (Local Gabor Binary Pattern). We examine the distribution of different local descriptors among the facial area from which some useful observations can be made. In addition, we test the discriminative power of the basis-images in a face detection framework for the basic LBPs. The detector is fast to train and uses only a set of strictly frontal faces as inputs, operating without non-faces and bootstrapping. The face detector performance is tested with the full CMU+MIT database.

Olli Lahdenoja, Mika Laiho, Ari Paasio

Informative Laplacian Projection

A new approach of constructing the similarity matrix for eigendecomposition on graph Laplacians is proposed. We first connect the Locality Preserving Projection method to probability density derivatives, which are then replaced by informative score vectors. This change yields a normalization factor and increases the contribution of the data pairs in low-density regions. The proposed method can be applied to both unsupervised and supervised learning. Empirical study on facial images is provided. The experiment results demonstrate that our method is advantageous for discovering statistical patterns in sparse data areas.

Zhirong Yang, Jorma Laaksonen

Segmentation of Highly Lignified Zones in Wood Fiber Cross-Sections

Lignification of wood fibers has important consequences to the paper production, but its exact effects are not well understood. To correlate exact levels of lignin in wood fibers to their mechanical properties, lignin autofluorescence is imaged in wood fiber cross-sections. Highly lignified areas can be detected and related to the area of the whole cell wall. Presently these measurements are performed manually, which is tedious and expensive. In this paper a method is proposed to estimate the degree of lignification automatically. A multi-stage snake-based segmentation is applied on each cell separately. To make a preliminary evaluation we used an image which contained 17 complete cell cross-sections. This image was segmented both automatically and manually by an expert. There was a highly significant correlation between the two methods, although a systematic difference indicates a disagreement in the definition of the edges between the expert and the algorithm.

Bettina Selig, Cris L. Luengo Hendriks, Stig Bardage, Gunilla Borgefors

Dense and Deformable Motion Segmentation for Wide Baseline Images

In this paper we describe a dense motion segmentation method for wide baseline image pairs. Unlike many previous methods our approach is able to deal with deforming motions and large illumination changes by using a bottom-up segmentation strategy. The method starts from a sparse set of seed matches between the two images and then proceeds to quasi-dense matching which expands the initial seed regions by using local propagation. Then, the quasi-dense matches are grouped into coherently moving segments by using local bending energy as the grouping criterion. The resulting segments are used to initialize the motion layers for the final dense segmentation stage, where the geometric and photometric transformations of the layers are iteratively refined together with the segmentation, which is based on graph cuts. Our approach provides a wider range of applicability than the previous approaches which typically require a rigid planar motion model or motion with small disparity. In addition, we model the photometric transformations in a spatially varying manner. Our experiments demonstrate the performance of the method with real images involving deforming motion and large changes in viewpoint, scale and illumination.

Juho Kannala, Esa Rahtu, Sami S. Brandt, Janne Heikkilä

A Two-Phase Segmentation of Cell Nuclei Using Fast Level Set-Like Algorithms

An accurate localization of a cell nucleus boundary is inevitable for any further quantitative analysis of various subnuclear structures within the cell nucleus. In this paper, we present a novel approach to the cell nucleus segmentation in fluorescence microscope images exploiting the level set framework. The proposed method works in two phases. In the first phase, the image foreground is separated from the background using a fast level set-like algorithm by Nilsson and Heyden [1]. A binary mask of isolated cell nuclei as well as their clusters is obtained as a result of the first phase. A fast topology-preserving level set-like algorithm by Maška and Matula [2] is applied in the second phase to delineate individual cell nuclei within the clusters. The potential of the new method is demonstrated on images of DAPI-stained nuclei of a lung cancer cell line A549 and promyelocytic leukemia cell line HL60.

Martin Maška, Ondřej Daněk, Carlos Ortiz-de-Solórzano, Arrate Muñoz-Barrutia, Michal Kozubek, Ignacio Fernández García

A Fast Optimization Method for Level Set Segmentation

Level set methods are a popular way to solve the image segmentation problem in computer image analysis. A contour is implicitly represented by the zero level of a signed distance function, and evolved according to a motion equation in order to minimize a cost function. This function defines the objective of the segmentation problem and also includes regularization constraints. Gradient descent search is the de facto method used to solve this optimization problem. Basic gradient descent methods, however, are sensitive for local optima and often display slow convergence. Traditionally, the cost functions have been modified to avoid these problems. In this work, we instead propose using a modified gradient descent search based on resilient propagation (Rprop), a method commonly used in the machine learning community. Our results show faster convergence and less sensitivity to local optima, compared to traditional gradient descent.

Thord Andersson, Gunnar Läthén, Reiner Lenz, Magnus Borga

Segmentation of Touching Cell Nuclei Using a Two-Stage Graph Cut Model

Methods based on combinatorial graph cut algorithms received a lot of attention in the recent years for their robustness as well as reasonable computational demands. These methods are built upon an underlying Maximum a Posteriori estimation of Markov Random Fields and are suitable to solve accurately many different problems in image analysis, including image segmentation. In this paper we present a two-stage graph cut based model for segmentation of touching cell nuclei in fluorescence microscopy images. In the first stage voxels with very high probability of being foreground or background are found and separated by a boundary with a minimal geodesic length. In the second stage the obtained clusters are split into isolated cells by combining image gradient information and incorporated a priori knowledge about the shape of the nuclei. Moreover, these two qualities can be easily balanced using a single user parameter. Preliminary tests on real data show promising results of the method.

Ondřej Daněk, Pavel Matula, Carlos Ortiz-de-Solórzano, Arrate Muñoz-Barrutia, Martin Maška, Michal Kozubek

Parallel Volume Image Segmentation with Watershed Transformation

We present a novel approach to parallel image segmentation of volume images on shared memory computer systems with watershed transformation by immersion. We use the domain decomposition method to break the sequential algorithm in multiple threads for parallel computation. The use of a chromatic ordering allows us to gain a correct segmentation without an examination of adjacent domains or a final relabeling. We will briefly discuss our approach and display results and speedup measurements of our implementation.

Björn Wagner, Andreas Dinges, Paul Müller, Gundolf Haase

Fast-Robust PCA

Principal Component Analysis (PCA) is a powerful and widely used tool in Computer Vision and is applied, e.g., for dimensionality reduction. But as a drawback, it is not robust to outliers. Hence, if the input data is corrupted, an arbitrarily wrong representation is obtained. To overcome this problem, various methods have been proposed to robustly estimate the PCA coefficients, but these methods are computationally too expensive for practical applications. Thus, in this paper we propose a novel fast and robust PCA (FR-PCA), which drastically reduces the computational effort. Moreover, more accurate representations are obtained. In particular, we propose a two-stage outlier detection procedure, where in the first stage outliers are detected by analyzing a large number of smaller subspaces. In the second stage, remaining outliers are detected by a robust least-square fitting. To show these benefits, in the experiments we evaluate the FR-PCA method for the task of robust image reconstruction on the publicly available ALOI database. The results clearly show that our approach outperforms existing methods in terms of accuracy and speed when processing corrupted data.

Markus Storer, Peter M. Roth, Martin Urschler, Horst Bischof

Efficient K-Means VLSI Architecture for Vector Quantization

A novel hardware architecture for


-means clustering is presented in this paper. Our architecture is fully pipelined for both the partitioning and centroid computation operations so that multiple training vectors can be concurrently processed. The proposed architecture is used as a hardware accelerator for a softcore NIOS CPU implemented on a FPGA device for physical performance measurement. Numerical results reveal that our design is an effective solution with low area cost and high computation performance for


-means design.

Hui-Ya Li, Wen-Jyi Hwang, Chih-Chieh Hsu, Chia-Lung Hung

Joint Random Sample Consensus and Multiple Motion Models for Robust Video Tracking

We present a novel method for tracking multiple objects in video captured by a non-stationary camera. For low quality video,


estimation fails when the number of good matches shrinks below the minimum required to estimate the motion model. This paper extends


in the following ways: (a) Allowing multiple models of different complexity to be chosen at random; (b) Introducing a conditional probability to measure the suitability of each transformation candidate, given the object locations in previous frames; (c) Determining the best suitable transformation by the number of consensus points, the probability and the model complexity. Our experimental results have shown that the proposed estimation method better handles video of low quality and that it is able to track deformable objects with pose changes, occlusions, motion blur and overlap. We also show that using multiple models of increasing complexity is more effective than just using


with the complex model only.

Petter Strandmark, Irene Y. H. Gu

Extending GKLT Tracking—Feature Tracking for Controlled Environments with Integrated Uncertainty Estimation

Guided Kanade-Lucas-Tomasi (GKLT) feature tracking offers a way to perform KLT tracking for rigid scenes using known camera parameters as prior knowledge, but requires manual control of uncertainty. The uncertainty of prior knowledge is unknown in general. We present an extended modeling of GKLT that overcomes the need of manual adjustment of the uncertainty parameter. We establish an extended optimization error function for GKLT feature tracking, from which we derive extended parameter update rules and a new optimization algorithm in the context of KLT tracking. By this means we give a new formulation of KLT tracking using known camera parameters originating, for instance, from a controlled environment. We compare the extended GKLT tracking method with the original GKLT and the standard KLT tracking using real data. The experiments show that the extended GKLT tracking performs better than the standard KLT and reaches an accuracy up to several times better than the original GKLT with an improperly chosen value of the uncertainty parameter.

Michael Trummer, Christoph Munkelt, Joachim Denzler

Image Based Quantitative Mosaic Evaluation with Artificial Video

Interest towards image mosaicing has existed since the dawn of photography. Many automatic digital mosaicing methods have been developed, but unfortunately their evaluation has been only qualitative. Lack of generally approved measures and standard test data sets impedes comparison of the works by different research groups. For scientific evaluation, mosaic quality should be quantitatively measured, and standard protocols established. In this paper the authors propose a method for creating artificial video images with virtual camera parameters and properties for testing mosaicing performance. Important evaluation issues are addressed, especially mosaic coverage. The authors present a measuring method for evaluating mosaicing performance of different algorithms, and showcase it with the root-mean-squared error. Three artificial test videos are presented, ran through real-time mosaicing method as an example, and published in the Web to facilitate future performance comparisons.

Pekka Paalanen, Joni-Kristian Kämäräinen, Heikki Kälviäinen

Improving Automatic Video Retrieval with Semantic Concept Detection

We study the usefulness of intermediate semantic concepts in bridging the semantic gap in automatic video retrieval. The results of a series of large-scale retrieval experiments, which combine text-based search, content-based retrieval, and concept-based retrieval, is presented. The experiments use the common video data and sets of queries from three successive TRECVID evaluations. By including concept detectors, we observe a consistent improvement on the search performance, despite the fact that the performance of the individual detectors is still often quite modest.

Markus Koskela, Mats Sjöberg, Jorma Laaksonen

Content-Aware Video Editing in the Temporal Domain

An extension of 2D Seam Carving [Avidan and Shamir, 2007] is presented, which allows for automatic resizing the duration of video from stationary cameras without interfering with the velocities of the objects in the scenes. We are not interested in cutting out entire frames, but instead in removing spatial information across different frames. Thus we identify a set of pixels across different video frames to be either removed or duplicated in a seamless manner by analyzing 3D space-time sheets in the videos. Results are presented on several challenging video sequences.

Kristine Slot, René Truelsen, Jon Sporring

High Definition Wearable Video Communication

High definition (HD) video can provide video communication which is as crisp and sharp as face-to-face communication. Wearable video equipment also provide the user with mobility; the freedom to move. HD video requires high bandwidth and yields high encoding and decoding complexity when encoding based on DCT and motion estimation is used. We propose a solution that can drastically lower the bandwidth and complexity for video transmission. Asymmetrical principal component analysis can initially encode HD video into bitrates which are low considering the type of video (< 300 kbps) and after a startup phase the bitrate can be reduced to less than 5 kbps. The complexity for encoding and decoding of this video is very low; something that will save battery power for mobile devices. All of this is done only at the cost of lower quality in frame areas which aren’t considered semantically important.

Ulrik Söderström, Haibo Li

Regularisation of 3D Signed Distance Fields

Signed 3D distance fields are used a in a variety of domains. From shape modelling to surface registration. They are typically computed based on sampled point sets. If the input point set contains holes, the behaviour of the zero-level surface of the distance field is not well defined. In this paper, a novel regularisation approach is described. It is based on an energy formulation, where both local smoothness and data fidelity are included. The minimisation of the global energy is shown to be the solution of a large set of linear equations. The solution to the linear system is found by sparse Cholesky factorisation. It is demonstrated that the zero-level surface will act as a membrane after the proposed regularisation. This effectively closes holes in a predictable way. Finally, the performance of the method is tested with a set of synthetic point clouds of increasing complexity.

Rasmus R. Paulsen, Jakob Andreas Bærentzen, Rasmus Larsen

An Evolutionary Approach for Object-Based Image Reconstruction Using Learnt Priors

In this paper we present a novel algorithm for reconstructing binary images containing objects which can be described by some parameters. In particular, we investigate the problem of reconstructing binary images representing disks from four projections. We develop a genetic algorithm for this and similar problems. We also discuss how prior information on the number of disks can be incorporated into the reconstruction in order to obtain more accurate images. In addition, we present a method to exploit such kind of knowledge from the projections themselves. Experiments on artificial data are also conducted.

Péter Balázs, Mihály Gara

Disambiguation of Fingerprint Ridge Flow Direction—Two Approaches

One of the challenges to be overcome in automated fingerprint matching is the construction of a ridge pattern representation that encodes all the relevant information while discarding unwanted detail. Research published recently has shown how this might be achieved by representing the ridges and valleys as a periodic wave. However, deriving such a representation requires assigning a consistent unambiguous direction field to the ridge flow, a task complicated by the presence of singular points in the flow pattern. This disambiguation problem appears to have received very little attention.

We discuss two approaches to this problem — one involving construction of branch cuts, the other using a divide-and-conquer approach, and show how either of these techniques can be used to obtain a consistent flow direction map, which then enables the construction of a phase based representation of the ridge pattern.

Robert O. Hastings

Similarity Matches of Gene Expression Data Based on Wavelet Transform

This study presents a similarity-determining method for measuring regulatory relationships between pairs of genes from microarray time series data. The proposed similarity metrics are based on a new method to measure structural similarity to compare the quality of images. We make use of the Dual-Tree Wavelet Transform (DTWT) since it provides approximate shift invariance and maintain the structures between pairs of regulation related time series expression data. Despite the simplicity of the presented method, experimental results demonstrate that it enhances the similarity index when tested on known transcriptional regulatory genes.

Mong-Shu Lee, Mu-Yen Chen, Li-Yu Liu

Poster Session 2

Simple Comparison of Spectral Color Reproduction Workflows

In this article we compare two workflows for spectral color reproduction : colorant separation (CS) followed by halftoning by scalar error diffusion (SED) of its resulting multi-colorant channel image and a second workflow by spectral vector error diffusion (sVED). Identical filters are used in both SED and sVED to diffuse the error. Gamut mapping is performed as pre-processing and the reproductions are compared to the gamut mapped spectral data. The inverse spectral Yule-Nielsen modified Neugebauer (YNSN) model is used for the colorant separation. To bring the improvement of the YNSN model upon the regular Neugebauer model into the sVED halftoning the


factor is introduced in the sVED halftoning. The performances of both workflows are evaluated in term of spectral and color differences but also visually with the dot distributions obtained by the two halftoning techniques. Experimental results have shown close performances for the compared workflows in term of color and spectral differences but visually cleaner and more stable dot distributions obtained by sVED.

Jérémie Gerhardt, Jon Yngve Hardeberg

Kernel Based Subspace Projection of Near Infrared Hyperspectral Images of Maize Kernels

In this paper we present an exploratory analysis of hyper-spectral 900-1700 nm images of maize kernels. The imaging device is a line scanning hyper spectral camera using a broadband NIR illumination. In order to explore the hyperspectral data we compare a series of subspace projection methods including principal component analysis and maximum autocorrelation factor analysis. The latter utilizes the fact that interesting phenomena in images exhibit spatial autocorrelation. However, linear projections often fail to grasp the underlying variability on the data. Therefore we propose to use so-called kernel version of the two afore-mentioned methods. The kernel methods implicitly transform the data to a higher dimensional space using non-linear transformations while retaining the computational complexity. Analysis on our data example illustrates that the proposed kernel maximum autocorrelation factor transform outperform the linear methods as well as kernel principal components in producing interesting projections of the data.

Rasmus Larsen, Morten Arngren, Per Waaben Hansen, Allan Aasbjerg Nielsen

The Number of Linearly Independent Vectors in Spectral Databases

Linear dependence among spectra in spectral databases affects the eigenvectors obtained from principal component analysis. This affects the values of usual spectral and colorimetric metrics. The effective dependence is proposed as a tool to quantify the maximum number of linearly independent vectors in the database. The results of the proposed algorithm do not depend on the selection of the first seed vector and are consistent with the results based on reduction of the bivariate coefficient of determination.

Carlos Sáenz, Begoña Hernández, Coro Alberdi, Santiago Alfonso, José Manuel Diñeiro

A Clustering Based Method for Edge Detection in Hyperspectral Images

Edge detection in hyperspectral images is an intrinsically difficult problem as the gray value intensity images related to single spectral bands may show different edges. The few existing approaches are either based on a straight forward combining of these individual edge images, or on finding the outliers in a region segmentation. As an alternative, we propose a clustering of all image pixels in a feature space constructed by the spatial gradients in the spectral bands. An initial comparative study shows the differences and properties of these approaches and makes clear that the proposal has interesting properties that should be studied further.

V. C. Dinh, Raimund Leitner, Pavel Paclik, Robert P. W. Duin

Contrast Enhancing Colour to Grey

A spatial algorithm to convert colour images to greyscale is presented. The method is very fast and results in increased local and global contrast. At each image pixel, three weights are calculated. These are defined as the difference between the blurred luminance image and the colour channels: red, green and blue. The higher the difference the more weight is given to that channel in the conversion. The method is multi-resolution and allows the user to enhance contrast at different scales. Results based on three colour images show that the method results in higher contrast than luminance and two spatial methods: Socolinsky and Wolff [1,2] and Alsam and Drew [3].

Ali Alsam

On the Use of Gaze Information and Saliency Maps for Measuring Perceptual Contrast

In this paper, we propose and discuss a novel approach for measuring perceived contrast. The proposed method comes from the modification of previous algorithms with a different local measure of contrast and with a parameterized way to recombine local contrast maps and color channels. We propose the idea of recombining the local contrast maps using gaze information, saliency maps and a gaze-attentive fixation finding engine as weighting parameters giving attention to regions that observers stare at, finding them important. Our experimental results show that contrast measures cannot be improved using different weighting maps as contrast is an intrinsic factor and it’s judged by the global impression of the image.

Gabriele Simone, Marius Pedersen, Jon Yngve Hardeberg, Ivar Farup

A Method to Analyze Preferred MTF for Printing Medium Including Paper

A method is proposed to analyze the preferred Modulation Transfer Function (MTF) of printing medium like paper for the image quality of printing. First, the spectral intensity distribution of printed image is simulated by changing the MTF of medium. Next, the simulated image is displayed on a high-precision LCD to reproduce the appearance of printed image. An observer rating evaluation experiment is carried out to the displayed image to discuss what the preferred MTF is. The appearance simulation of printed image was conducted on particular printing conditions: several contents, ink colors, a halftoning method and a print resolution (dpi). The experiments on different printing conditions can be conducted since our simulation method is flexible about changing conditions.

Masayuki Ukishima, Martti Mäkinen, Toshiya Nakaguchi, Norimichi Tsumura, Jussi Parkkinen, Yoichi Miyake

Efficient Denoising of Images with Smooth Geometry

In the paper the method of smooth geometry image denoising has been presented. It is based on smooth second order wedgelets proposed in this paper. Smooth wedgelets (and second order wedgelets) are defined as wedgelets with smooth edges. Additionally, smooth borders of quadtree partition have been introduced. The first kind of smoothness is defined adaptively whereas the second one is fixed once for the whole estimation process. The proposed kind of wedgelets has been applied to image denoising. As follows from experiments performed on benchmark images this method gives far better results of denoising of images with smooth geometry than the other state-of-the-art methods.

Agnieszka Lisowska

Kernel Entropy Component Analysis Pre-images for Pattern Denoising

The recently proposed kernel entropy component analysis (kernel ECA) technique may produce strikingly different spectral data sets than kernel PCA for a wide range of kernel sizes. In this paper, we investigate the use of kernel ECA as a component in a denoising technique previously developed for kernel PCA. The method is based on mapping noisy data to a kernel feature space, for then to denoise by projecting onto a kernel ECA subspace. The denoised data in the input space is obtained by computing pre-images of kernel ECA denoised patterns. The denoising results are in several cases improved.

Robert Jenssen, Ola Storås

Combining Local Feature Histograms of Different Granularities

Histograms of local features have proven to be powerful representations in image category detection. Histograms with different numbers of bins encode the visual information with different granularities. In this paper we experimentally compare techniques for combining different granularities in a way that the resulting descriptors can be used as feature vectors in conventional vector space learning algorithms. In particular, we consider two main approaches: fusing the granularities on SVM kernel level and moving away from binary or hard to soft histograms. We find soft histograms to be a more effective approach, resulting in substantial performance improvement over single-granularity histograms.

Ville Viitaniemi, Jorma Laaksonen

Extraction of Windows in Facade Using Kernel on Graph of Contours

In the past few years, street-level geoviewers has become a very popular web-application. In this paper, we focus on a first urban concept which has been identified as useful for indexing then retrieving a building or a location in a city: the windows. The work can be divided into three successive processes: first, object detection, then object characterization, finally similarity function design (kernel design). Contours seem intuitively relevant to hold architecture information from building facades. We first provide a robust window detector for our unconstrained data, present some results and compare our method with the reference one. Then, we represent objects by fragments of contours and a relational graph on these contour fragments. We design a kernel similarity function for structured sets of contours which will take into account the variations of contour orientation inside the structure set as well as spatial proximity. One difficulty to evaluate the relevance of our approach is that there is no reference database available. We made, thus, our own dataset. The results are quite encouraging regarding what was expected and what provide methods the literature.

Jean-Emmanuel Haugeard, Sylvie Philipp-Foliguet, Frédéric Precioso, Justine Lebrun

Multi-view and Multi-scale Recognition of Symmetric Patterns

This paper suggests the use of symmetric patterns and their corresponding symmetry filters for pattern recognition in computer vision tasks involving multiple views and scales. Symmetry filters enable efficient computation of certain structure features as represented by the generalized structure tensor (GST). The properties of the complex moments to changes in scale and multiple views including in-depth rotation of the patterns and the presence of noise is investigated. Images of symmetric patterns captured using a low resolution low-cost CMOS camera, such as a phone camera or a web-cam, from as far as three meters are precisely localized and their spatial orientation is determined from the argument of the second order complex moment



without further computation.

Dereje Teferi, Josef Bigun

Automatic Quantification of Fluorescence from Clustered Targets in Microscope Images

A cluster of fluorescent targets appears as overlapping spots in microscope images. By quantifying the spot intensities and locations, the properties of the fluorescent targets can be determined. Commonly this is done by reducing noise with a low-pass filter and separating the spots by fitting a Gaussian mixture model with a local optimization algorithm. However, filtering smears the overlapping spots together and lowers quantification accuracy, and the local optimization algorithms are uncapable to find the model parameters reliably. In this sudy we developed a method to quantify the overlapping spots accurately directly from the raw images with a stochastic global optimization algorithm. To evaluate the method, we created simulated noisy images with overlapping spots. The simulation results showed the developed method produced more accurate spot intensity and location estimates than the compared methods. Microscopy data of cell membrane with caveolae spots was also succesfully quantified with the developed method.

Harri Pölönen, Jussi Tohka, Ulla Ruotsalainen

Bayesian Classification of Image Structures

In this paper, we describe work on Bayesian classifiers for distinguishing between homogeneous structures, textures, edges and junctions. We build semi–local classifiers from hand-labeled images to distinguish between these four different kinds of structures based on the concept of intrinsic dimensionality. The built classifier is tested on standard and non-standard images.

D. Goswami, S. Kalkan, N. Krüger

Globally Optimal Least Squares Solutions for Quasiconvex 1D Vision Problems

Solutions to non-linear least squares problems play an essential role in structure and motion problems in computer vision. The predominant approach for solving these problems is a Newton like scheme which uses the hessian of the function to iteratively find a local solution. Although fast, this strategy inevitably leeds to issues with poor local minima and missed global minima.

In this paper rather than trying to develop an algorithm that is guaranteed to always work, we show that it is often possible to verify that a local solution is in fact also global. We present a simple test that verifies optimality of a solution using only a few linear programs. We show on both synthetic and real data that for the vast majority of cases we are able to verify optimality. Further more we show even if the above test fails it is still often possible to verify that the local solution is global with high probability.

Carl Olsson, Martin Byröd, Fredrik Kahl

Spatio-temporal Super-Resolution Using Depth Map

This paper describes a spatio-temporal super-resolution method using depth maps for static scenes. In the proposed method, the depth maps are used as the parameters to determine the corresponding pixels in multiple input images by assuming that intrinsic and extrinsic camera parameters are known. Because the proposed method can determine the corresponding pixels in multiple images by a one-dimensional search for the depth values without the planar assumption that is often used in the literature, spatial resolution can be increased even for complex scenes. In addition, since we can use multiple frames, temporal resolution can be increased even when large parts of the image are occluded in the adjacent frame. In experiments, the validity of the proposed method is demonstrated by generating spatio-temporal super-resolution images for both synthetic and real movies.

Yusaku Awatsu, Norihiko Kawai, Tomokazu Sato, Naokazu Yokoya

A Comparison of Iterative 2D-3D Pose Estimation Methods for Real-Time Applications

This work compares iterative 2D-3D Pose Estimation methods for use in real-time applications. The compared methods are available for public as C++ code. One method is part of the openCV library, namely POSIT. Because POSIT is not applicable for planar 3D-point configurations, we include the planar POSIT version. The second method optimizes the pose parameters directly by solving a Non-linear Least Squares problem which minimizes the reprojection error. For reference the Direct Linear Transform (DLT) for estimation of the projection matrix is inlcuded as well.

Daniel Grest, Thomas Petersen, Volker Krüger

A Comparison of Feature Detectors with Passive and Task-Based Visual Saliency

This paper investigates the coincidence between six interest point detection methods (SIFT, MSER, Harris-Laplace, SURF, FAST & Kadir-Brady Saliency) with two robust “bottom-up” models of visual saliency (Itti and Harel) as well as “task” salient surfaces derived from observer eye-tracking data. Comprehensive statistics for all detectors vs. saliency models are presented in the presence and absence of a visual search task. It is found that SURF interest-points generate the highest coincidence with saliency and the overlap is superior by 15% for the SURF detector compared to other features. The overlap of image features with task saliency is found to be also distributed towards the salient regions. However the introduction of a specific search task creates high ambiguity in knowing how attention is shifted. It is found that the Kadir-Brady interest point is more resilient to this shift but is the least coincident overall.

Patrick Harding, Neil M. Robertson

Grouping of Semantically Similar Image Positions

Features from the Scale Invariant Feature Transformation (SIFT) are widely used for matching between spatially or temporally displaced images. Recently a topology on the SIFT features of a single image has been introduced where features of a similar semantics are close in this topology. We continue this work and present a technique to automatically detect groups of SIFT positions in a single image where all points of one group possess a similar semantics. The proposed method borrows ideas and techniques from the Color-Structure-Code segmentation method and does not require any user intervention.

Lutz Priese, Frank Schmitt, Nils Hering

Recovering Affine Deformations of Fuzzy Shapes

Fuzzy sets and fuzzy techniques are attracting increasing attention nowadays in the field of image processing and analysis. It has been shown that the information preserved by using fuzzy representation based on area coverage may be successfully utilized to improve precision and accuracy of several shape descriptors; geometric moments of a shape are among them. We propose to extend an existing binary shape matching method to take advantage of fuzzy object representation. The result of a synthetic test show that fuzzy representation yields smaller registration errors in average. A segmentation method is also presented to generate fuzzy segmentations of real images. The applicability of the proposed methods is demonstrated on real X-ray images of hip replacement implants.

Attila Tanács, Csaba Domokos, Nataša Sladoje, Joakim Lindblad, Zoltan Kato

Shape and Texture Based Classification of Fish Species

In this paper we conduct a case study of fish species classification based on shape and texture. We consider three fish species: cod, haddock, and whiting. We derive shape and texture features from an appearance model of a set of training data. The fish in the training images were manual outlined, and a few features including the eye and backbone contour were also annotated. From these annotations an optimal MDL curve correspondence and a subsequent image registration were derived. We have analyzed a series of shape and texture and combined shape and texture modes of variation for their ability to discriminate between the fish types, as well as conducted a preliminary classification. In a linear discrimant analysis based on the two best combined modes of variation we obtain a resubstitution rate of 76 %.

Rasmus Larsen, Hildur Olafsdottir, Bjarne Kjær Ersbøll

Improved Quantification of Bone Remodelling by Utilizing Fuzzy Based Segmentation

We present a novel fuzzy theory based method for the segmentation of images required in histomorphometrical investigations of bone implant integration. The suggested method combines discriminant analysis classification controlled by an introduced uncertainty measure, and fuzzy connectedness segmentation method, so that the former is used for automatic seeding of the later. A thorough evaluation of the proposed segmentation method is performed. Comparison with previously published automatically obtained measurements, as well as with manually obtained ones, is presented. The proposed method improves the segmentation and, consequently, the accuracy of the automatic measurements, while keeping advantages with respect to the manual ones, by being fast, repeatable, and objective.

Joakim Lindblad, Nataša Sladoje, Vladimir Ćurić, Hamid Sarve, Carina B. Johansson, Gunilla Borgefors

Fusion of Multiple Expert Annotations and Overall Score Selection for Medical Image Diagnosis

Two problems especially important for supervised learning and classification in medical image processing are addressed in this study: i) how to fuse medical annotations collected from several medical experts and ii) how to form an image-wise overall score for accurate and reliable automatic diagnosis. Both of the problems are addressed by applying the same receiver operating characteristic (ROC) framework which is made to correspond to the medical practise. The first problem arises from the typical need to collect the medical ground truth from several experts to understand the underlying phenomenon and to increase robustness. However, it is currently unclear how these expert opinions (annotations) should be combined for classification methods. The second problem is due to the ultimate goal of any automatic diagnosis, a patient-based (image-wise) diagnosis, which consequently must be the ultimate evaluation criterion before transferring any methods into practise. Various image processing methods provide several, e.g., spatially distinct, results, which should be combined into a single image-wise score value. We discuss and investigate these two problems in detail, propose good strategies and report experimental results on a diabetic retinopathy database verifying our findings.

Tomi Kauppi, Joni-Kristian Kamarainen, Lasse Lensu, Valentina Kalesnykiene, Iiris Sorri, Heikki Kälviäinen, Hannu Uusitalo, Juhani Pietilä

Quantification of Bone Remodeling in SRμCT Images of Implants

For quantification of bone remodeling around implants, we combine information obtained by two modalities: 2D histological sections imaged in light microscope and 3D synchrotron radiation-based computed microtomography, SRμCT. In this paper, we present a method for segmenting SRμCT volumes. The impact of shading artifact at the implant interface is reduced by modeling the artifact. The segmentation is followed by quantitative analysis. To facilitate comparison with existing results, the quantification is performed on a registered 2D slice from the volume, which corresponds to a histological section from the same sample. The quantification involves measurements of bone area and bone-implant contact percentages.

We compare the results obtained by the proposed method on the SRμCT data with manual measurements on the histological sections and discuss the advantages of including SRμCT data in the analysis.

Hamid Sarve, Joakim Lindblad, Carina B. Johansson


Additional information

Premium Partner

    Image Credits