Skip to main content

2009 | Buch

Computer Analysis of Images and Patterns

13th International Conference, CAIP 2009, Münster, Germany, September 2-4, 2009. Proceedings

herausgegeben von: Xiaoyi Jiang, Nicolai Petkov

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

It was an honor and a pleasure to organizethe 13th International Conference on Computer Analysis of Images and Patterns (CAIP 2009) in Mu ¨nster, Germany. CAIP has been held biennially since 1985: Berlin (1985), Wismar (1987), Leipzig (1989), Dresden (1991), Budapest (1993), Prague (1995), Kiel (1997), Ljubljana (1999), Warsaw (2001), Groningen (2003), Paris (2005), and Vienna (2007). Initially, this conference series served as a forum for getting together s- entistsfromEastandWestEurope.Nowadays,CAIPenjoysahighinternational visibility and attracts participants from all over the world. For CAIP 2009 we received a record number of 405 submissions. All papers were reviewed by two, and in most cases, three reviewers. Finally, 148 papers were selected for presentation at the conference, resulting in an acceptance rate of 36%. All Program Committee members and additional reviewers listed here deserve a great thanks for their timely and competent reviews. The accepted papers were presented either as oral presentations or posters in a single-track program.In addition, wewereveryhappyto haveAljoscha Smolicand David G. Storkasourinvitedspeakerstopresenttheirworkintwofascinatingareas.With this scienti?c program we hope to continue the tradition of CAIP in providing a forum for scienti?c exchange at a high quality level. A successful conference like CAIP 2009 would not be possible without the support of many institutions and people. First of all, we like to thank all the authors of submitted papers and the invited speakers for their contributions. The Steering Committee members were always there when advice was needed.

Inhaltsverzeichnis

Frontmatter

Invited Talks

An Overview of 3D Video and Free Viewpoint Video

An overview of 3D video and free viewpoint video is given in this paper. Free viewpoint video allows the user to freely navigate within real world visual scenes, as known from virtual worlds in computer graphics. 3D video provides the user with a 3D depth impression of the observed scene, which is also known as stereo video. In that sense as functionalities, 3D video and free viewpoint video are not mutually exclusive but can very well be combined in a single system. Research in this area combines computer graphics, computer vision and visual communications. It spans the whole media processing chain from capture to display and the design of systems has to take all parts into account. The conclusion is that the necessary technology including standard media formats for 3D video and free viewpoint video is available or will be available in the future, and that there is a clear demand from industry and user side for such new types of visual media.

Aljoscha Smolic
Computer Vision and Computer Graphics Analysis of Paintings and Drawings: An Introduction to the Literature

In the past few years, a number of scholars trained in computer vision, pattern recognition, image processing, computer graphics, and art history have developed rigorous computer methods for addressing an increasing number of problems in the history of art. In some cases, these computer methods are more accurate than even highly trained connoisseurs, art historians and artists. Computer graphics models of artists’ studios and subjects allow scholars to explore ‘‘what if’’ scenarios and determine artists’ studio praxis. Rigorous computer ray-tracing software sheds light on claims that some artists employed optical tools. Computer methods will not replace tradition art historical methods of connoisseurship but enhance and extend them. As such, for these computer methods to be useful to the art community, they must continue to be refined through application to a variety of significant art historical problems.

David G. Stork

Biometrics

Head Pose Estimation by a Stepwise Nonlinear Regression

Head pose estimation is a crucial step for numerous face applications such as gaze tracking and face recognition. In this paper, we introduce a new method to learn the mapping between a set of features and the corresponding head pose. It combines a filter based feature selection and a Generalized Regression Neural Network where inputs are sequentially selected through a boosting process. We propose the Fuzzy Functional Criterion, a new filter used to select relevant features. At each step, features are evaluated using weights on examples computed using the error produced by the neural network at the previous step. This boosting strategy helps to focus on hard examples and selects a set of complementary features. Results are compared with three state-of-the-art methods on the Pointing 04 database.

Kevin Bailly, Maurice Milgram, Philippe Phothisane
Model-Based Illumination Correction for Face Images in Uncontrolled Scenarios

Face Recognition under uncontrolled illumination conditions is partly an unsolved problem. Several illumination correction methods have been proposed, but these are usually tested on illumination conditions created in a laboratory. Our focus is more on uncontrolled conditions. We use the Phong model which allows us to model ambient light in shadow areas. By estimating the face surface and illumination conditions, we are able to reconstruct a face image containing frontal illumination. The reconstructed face images give a large improvement in performance of face recognition in uncontrolled conditions.

Bas Boom, Luuk Spreeuwers, Raymond Veldhuis
A New Gabor Phase Difference Pattern for Face and Ear Recognition

A new local feature based image representation method is proposed. It is derived from the local Gabor phase difference pattern (LGPDP). This method represents images by exploiting relationships of Gabor phase between pixel and its neighbors. There are two main contributions: 1) a novel phase difference measure is defined; 2) new encoding rules to mirror Gabor phase difference information are designed. Because of them, this method describes Gabor phase difference more precisely than the conventional LGPDP. Moreover, it could discard useless information and redundancy produced near quadrant boundary, which commonly exist in LGPDP. It is shown that the proposed method brings higher discriminative ability to Gabor phase based pattern. Experiments are conducted on the FRGC version 2.0 and USTB Ear Database to evaluate its validity and generalizability. The proposed method is also compared with several state-of-the-art approaches. It is observed that our method achieves the highest recognition rates among them.

Yimo Guo, Guoying Zhao, Jie Chen, Matti Pietikäinen, Zhengguang Xu
Is White Light the Best Illumination for Palmprint Recognition?

Palmprint as a new biometric has received great research attention in the past decades. It owns many merits, such as robustness, low cost, user friendliness, and high accuracy. Most of the current palmprint recognition systems use an active light to acquire clear palmprint images. Thus, light source is a key component in the system to capture enough of discriminant information for palmprint recognition. To the best of our knowledge, white light is the most widely used light source. However, little work has been done on investigating whether white light is the best illumination for palmprint recognition. In this study, we empirically compared palmprint recognition accuracy using white light and other six different color lights. The experiments on a large database show that white light is not the optimal illumination for palmprint recognition. This finding will be useful to future palmprint recognition system design.

Zhenhua Guo, David Zhang, Lei Zhang
Second-Level Partition for Estimating FAR Confidence Intervals in Biometric Systems

Most biometric authentication algorithms make use of a similarity score that defines how similar two templates are according to a threshold and the accuracy of the results are expressed in terms of a False Reject Rate (FRR) or False Accept Rate (FAR) that is estimated using the training data set. A confidence interval is assigned to any claim of accuracy with 90% being commonly assumed for biometric-based authentication systems. However, these confidence intervals may not be as accurate as is presumed. In this paper, we report the results of experiments measuring the performance of the widely-used subset bootstrap approach to estimating the confidence interval of FAR. We find that the coverage of the FAR confidence intervals estimated by the subset bootstrap approach is reduced by the dependence between two similarities when they come from two individual pairs shared with a common individual. This is because subset bootstrap requires the independence of different subsets. To deal with this, we present a second-level partition to the similarity score set between different individuals, producing what we call a subset false accept rate (SFAR) bootstrap estimation. The experimental results show that the proposed procedures greatly increase the coverage of the FAR confidence intervals.

Rongfeng Li, Darun Tang, Wenxin Li, David Zhang
Smooth Multi-Manifold Embedding for Robust Identity-Independent Head Pose Estimation

In this paper, we propose a supervised Smooth Multi-Manifold Embedding (SMME) method for robust identity-independent head pose estimation. In order to handle the appearance variations caused by identity, we consider the pose data space as multiple manifolds in which each manifold characterizes the underlying subspace of subjects with similar appearance. We then propose a novel embedding criterion to learn each manifold from the exemplar-centered local structure of subjects. The experiment results on the standard databases demonstrates that the SMME is robust to variations of identities and achieves high pose estimation accuracy.

Xiangyang Liu, Hongtao Lu, Heng Luo
Retracted: Human Age Estimation by Metric Learning for Regression Problems

The estimation of human age from face images is an interesting problem in computer vision. We proposed a general distance metric learning scheme for regression problems, which utilizes not only data themselves, but also their corresponding labels to strengthen the credibility of distances. This metric could be learned by solving an optimization problem. Furthermore, the test data could be projected to this metric by a simple linear transformation and it is feasible to be combined with manifold learning algorithms to improve their performance. Experiments are conducted on the public FG-NET database by Gaussian process regression in the learned metric to validate our framework, which shows that the performance is improved over traditional methods.

Yangjing Long
Face Detection Using GPU-Based Convolutional Neural Networks

In this paper, we consider the problem of face detection under pose variations. Unlike other contributions, a focus of this work resides within efficient implementation utilizing the computational powers of modern graphics cards. The proposed system consists of a parallelized implementation of convolutional neural networks (CNNs) with a special emphasize on also parallelizing the detection process. Experimental validation in a smart conference room with 4 active ceiling-mounted cameras shows a dramatic speed-gain under real-life conditions.

Fabian Nasse, Christian Thurau, Gernot A. Fink
Gaussian Weak Classifiers Based on Haar-Like Features with Four Rectangles for Real-time Face Detection

This paper proposes Gaussian weak classifiers (GWCs) for use in real-time face detection systems. GWCs are based on Haar-like features (HFs) with four rectangles (HF4s), which constitute the majority of the HFs used to train a face detector. To label an image as face or clutter (non-face), GWC uses the responses of the two HF2s in a HF4 to compute a Mahalanobis distance which is later compared to a threshold to make decisions. For a fixed accuracy on the face class, GWCs can classify clutter images with more accuracy than the existing weak classifier types. Our experiments compare the accuracy and speed of the face detectors built with four different weak classifier types: GWCs, Viola & Jones’s, Rasolzadeh et al.’s and Mita et al.’s. On the standard MIT+CMU image database, the GWC-based face detector provided 40% less false positives and required 32% less time for the scanning process when compared to the detector that used Viola & Jones’s weak classifiers. When compared to detectors that used Rasolzadeh et al.’s and Mita et al.’s weak classifiers, the GWC-based detector produced 11% and 9% fewer false positives. Simultaneously, it required 37% and 42% less time for the scanning process.

Sri-Kaushik Pavani, David Delgado Gomez, Alejandro F. Frangi
Model Based Analysis of Face Images for Facial Feature Extraction

This paper describes a comprehensive approach to extract a common feature set from the image sequences. We use simple features which are easily extracted from a 3D wireframe model and efficiently used for different applications on a benchmark database. Features verstality is experimented on facial expressions recognition, face reognition and gender classification. We experiment different combinations of the features and find reasonable results with a combined features approach which contain structural, textural and temporal variations. The idea follows in fitting a model to human face images and extracting shape and texture information. We parametrize these extracted information from the image sequences using active appearance model (AAM) approach. We further compute temporal parameters using optical flow to consider local feature variations. Finally we combine these parameters to form a feature vector for all the images in our database. These features are then experimented with binary decision tree (BDT) and Bayesian Network (BN) for classification. We evaluated our results on image sequences of Cohn Kanade Facial Expression Database (CKFED). The proposed system produced very promising recognition rates for our applications with same set of features and classifiers. The system is also realtime capable and automatic.

Zahid Riaz, Christoph Mayer, Michael Beetz, Bernd Radig
Dynamics Analysis of Facial Expressions for Person Identification

We propose a new method for analyzing the dynamics of facial expressions to identify persons using Active Appearance Models and accurate facial feature point tracking. Several methods have been proposed to identify persons using facial images. In most methods, variations in facial expressions are one trouble factor. However, the dynamics of facial expressions are one measure of personal characteristics. In the proposed method, facial feature points are automatically extracted using Active Appearance Models in the first frame of each video. They are then tracked using the Lucas-Kanade based feature point tracking method. Next, a temporal interval is extracted from the beginning time to the ending time of facial expression changes. Finally, a feature vector is obtained. In the identification phase, an input feature vector is classified by calculating the distance between the input vector and the training vectors using dynamic programming matching. We show the effectiveness of the proposed method using smile videos from the MMI Facial Expression Database.

Hidenori Tanaka, Hideo Saito
Regression Based Non-frontal Face Synthesis for Improved Identity Verification

We propose a low-complexity face synthesis technique which transforms a 2D frontal view image into views at specific poses, without recourse to computationally expensive 3D analysis or iterative fitting techniques that may fail to converge. The method first divides a given image into multiple overlapping blocks, followed by synthesising a non-frontal representation through applying a multivariate linear regression model on a low-dimensional representation of each block. To demonstrate one application of the proposed technique, we augment a frontal face verification system by incorporating multi-view reference (gallery) images synthesised from the frontal view. Experiments on the pose subset of the FERET database show considerable reductions in error rates, especially for large deviations from the frontal view.

Yongkang Wong, Conrad Sanderson, Brian C. Lovell
Differential Feature Analysis for Palmprint Authentication

Palmprint authentication is becoming one of the most important biometric techniques because of its high accuracy and ease to use. The features on palm, including the palm lines, ridges and textures, etc., are resulted from the gray scale variance of the palmprint images. This paper characterizes these variance using different order differential operations. To avoid the effect of the illumination variance, only the signs of the pixel values of the differential images are used to encode palmprint to form palmprint differential code (PDC). In matching stage, normalized Hamming distance is employed to measure the similarity between different PDCs. The experimental results demonstrate that the proposed approach outperforms the existing palmprint authentication algorithms in terms of the accuracy, speed and storage requirement and the differential operations may be considered as one of the standard methods for palmprint feature extraction.

Xiangqian Wu, Kuanquan Wang, Yong Xu, David Zhang
Combining Facial Appearance and Dynamics for Face Recognition

In this paper, we present a novel hybrid feature for face recognition. This hybrid feature is created by combining the traditional holistic facial appearance feature with a recently proposed facial dynamics feature. We measure and compare the inherent discriminating power of this hybrid feature and the holistic facial appearance feature by the statistical separability between genuine feature distance and impostor feature distance. Our measurement indicates that the hybrid feature is more discriminative than the appearance feature.

Ning Ye, Terence Sim
Finger-Knuckle-Print Verification Based on Band-Limited Phase-Only Correlation

This paper investigates a new automated personal authentication technique using finger-knuckle-print (FKP) imaging. First, a specific data acquisition device is developed to capture the FKP images. The local convex direction map of the FKP image is then extracted, based on which a coordinate system is defined to align the images and a region of interest (ROI) is cropped for feature extraction and matching. To match two FKPs, we present a Band-Limited Phase-Only Correlation (BLPOC) based method to register the images and further to evaluate their similarity. An FKP database is established to examine the performance of the proposed method, and the promising experimental results demonstrated its advantage over the existing finger-back surface based biometric systems.

Lin Zhang, Lei Zhang, David Zhang

Calibration

Calibration of Radially Symmetric Distortion by Fitting Principal Component

To calibrate radially symmetric distortion of omnidirectional cameras such as fish-eye lenses, calibration parameters are usually estimated so that lines, which are supposed to be straight in the 3D real scene, are mapped to straight lines in the calibrated image. In this paper, this problem is treated as a fitting problem of the principal component in uncalibrated images, and an estimation procedure of calibration parameters is proposed based on the principal component analysis. Experimental results for synthetic data and real images are presented to demonstrate the performance of our calibration method.

Hideitsu Hino, Yumi Usami, Jun Fujiki, Shotaro Akaho, Noboru Murata
Calibration of Rotating Sensors

This paper reports about a method for calibrating rotating senors, namely, rotating sensor-line cameras and laser range-finders. Both together are used to reconstruct accurately 3D environments, such as, for example, large buildings. One of the important steps in the 3D reconstruction pipeline is the fusion of data. This requires an understanding of spatial relationships among the acquired data. Sensor calibration is the key to accurate 3D models.

Karsten Scheibe, Fay Huang, Reinhard Klette

Document Analysis

A New Approach for Segmentation and Recognition of Arabic Handwritten Touching Numeral Pairs

In this paper, we propose a new approach on segmentation and recognition of off-line unconstrained Arabic handwritten numerals, which failed to be segmented with connected component analysis. In our approach, the touching numerals are automatically segmented when a set of parameters is chosen. Models with different sets of parameters for each numeral pair are designed for recognition. Each image in each model is recognized as an isolated numeral. After normalizing and binarizing the images, gradient features are extracted and recognized using SVMs. Finally, a post-processing is proposed by based on the optimal combinations of the recognition probabilities for each model. Experiments were conducted on the CENPARMI Arabic, Dari, and Urdu touching numeral pair databases [1,12].

Huda Alamri, Chun Lei He, Ching Y. Suen
Ridges Based Curled Textline Region Detection from Grayscale Camera-Captured Document Images

As compared to scanners, cameras offer fast, flexible and non-contact document imaging, but with distortions like uneven shading and warped shape. Therefore, camera-captured document images need preprocessing steps like binarization and textline detection for dewarping so that traditional document image processing steps can be applied on them. Previous approaches of binarization and curled textline detection are sensitive to distortions and loose some crucial image information during each step, which badly affects dewarping and further processing. Here we introduce a novel algorithm for curled textline region detection directly from a grayscale camera-captured document image, in which matched filter bank approach is used for enhancing textline structure and then ridges detection is applied for finding central line of curled textlines. The resulting ridges can be potentially used for binarization, dewarping or designing new techniques for camera-captured document image processing. Our approach is robust against bad shading and high degrees of curl. We have achieved around 91% detection accuracy on the dataset of CBDAR 2007 document image dewarping contest.

Syed Saqib Bukhari, Faisal Shafait, Thomas M. Breuel
Kernel PCA for HMM-Based Cursive Handwriting Recognition

In this paper, we propose Kernel Principal Component Analysis as a feature selection method for offline cursive handwriting recognition based on Hidden Markov Models. In contrast to formerly used feature selection methods, namely standard Principal Component Analysis and Independent Component Analysis, nonlinearity is achieved by making use of a radial basis function kernel. In an experimental study we demonstrate that the proposed nonlinear method has a great potential to improve cursive handwriting recognition systems and is able to significantly outperform linear feature selection methods. We consider two diverse datasets of isolated handwritten words for the experimental evaluation, the first consisting of modern English words, and the second consisting of medieval Middle High German words.

Andreas Fischer, Horst Bunke
Improved Handwriting Recognition by Combining Two Forms of Hidden Markov Models and a Recurrent Neural Network

Handwritten word recognition has received a substantial amount of attention in the past. Neural Networks as well as discriminatively trained Maximum Margin Hidden Markov Models have emerged as cutting-edge alternatives to the commonly used Hidden Markov Models. In this paper, we analyze the combination of these classifiers with respect to their potential for improving recognition performance. It is shown that a significant improvement can in fact be achieved, although the individual recognizers are highly optimized state-of-the-art systems. Also, it is demonstrated that the diversity of the recognizers has a profound impact on the improvement that can be achieved by the combination.

Volkmar Frinken, Tim Peter, Andreas Fischer, Horst Bunke, Trinh-Minh-Tri Do, Thierry Artieres
Embedded Bernoulli Mixture HMMs for Continuous Handwritten Text Recognition

Hidden Markov Models (HMMs) are now widely used in off-line handwritten text recognition. As in speech recognition, they are usually built from shared, embedded HMMs at symbol level, in which state-conditional probability density functions are modelled with Gaussian mixtures. In contrast to speech recognition, however, it is unclear which kind of real-valued features should be used and, indeed, very different features sets are in use today. In this paper, we propose to by-pass feature extraction and directly fed columns of raw, binary image pixels into embedded Bernoulli mixture HMMs, that is, embedded HMMs in which the emission probabilities are modelled with Bernoulli mixtures. The idea is to ensure that no discriminative information is filtered out during feature extraction, which in some sense is integrated into the recognition model. Good empirical results are reported on the well-known IAM database.

Adrià Giménez, Alfons Juan
Recognition-Based Segmentation of Nom Characters from Body Text Regions of Stele Images Using Area Voronoi Diagram

Segmentation of Nom characters from body text regions of stele images is a challenging problem due to the confusing spatial distribution of the connected components composing these characters. In this paper, for each vertical text line, area Voronoi diagram is employed to represent the neighborhood of the connected components and Voronoi edges are used as nonlinear segmentation hypotheses. Characters are then segmented by selecting appropriate adjacent Voronoi regions. For this purpose, we utilize the information about the horizontal overlap of connected components and the recognition distances of candidate characters provided by an OCR engine. Experimental results show that the proposed method is highly accurate and robust to various types of stele.

Thai V. Hoang, Salvatore Tabbone, Ngoc-Yen Pham
A Novel Approach for Word Spotting Using Merge-Split Edit Distance

Edit distance matching has been used in literature for word spotting with characters taken as primitives. The recognition rate however, is limited by the segmentation inconsistencies of characters (broken or merged) caused by noisy images or distorted characters. In this paper, we have proposed a Merge-split edit distance which overcomes these segmentation problems by incorporating a multi-purpose merge cost function. The system is based on the extraction of words and characters in the text and then attributing each character with a set of features. Characters are matched by comparing their extracted feature sets using Dynamic Time Warping (DTW) while the words are matched by comparing the strings of characters using the proposed Merge-Split Edit distance algorithm. Evaluation of the method on 19th century historical document images exhibits extremely promising results.

Khurram Khurshid, Claudie Faure, Nicole Vincent
Hierarchical Decomposition of Handwritten Manuscripts Layouts

In this paper we propose a new approach to improve electronic editions of literary corpus, providing an efficient estimation of manuscripts pages structure. In any handwriting documents analysis process, structure recognition is an important issue. The presence of variable inter-line spaces, of inconstant base-line skews, overlappings and occlusions in unconstrained ancient 19th handwritten documents complicates the structure recognition task. Text line and fragment extraction is based on the connexity labelling of the adjacency graph at different resolution levels, for borders, lines and fragments extraction.

Vincent Malleron, Véronique Eglin, Hubert Emptoz, Stéphanie Dord-Crouslé, Philippe Régnier
Camera-Based Online Signature Verification with Sequential Marginal Likelihood Change Detector

Several online signature verification systems that use cameras have been proposed. These systems obtain online signature data from video images by tracking the pen tip. Such systems are very useful because special devices such as pen-operated digital tablets are not necessary. One drawback, however, is that if the captured images are blurred, pen tip tracking may fail, which causes performance degradation. To solve this problem, here we propose a scheme to detect such images and re-estimate the pen tip position associated with the blurred images. Our pen tracking algorithm is implemented by using the sequential Monte Carlo method, and a sequential marginal likelihood is used for blurred image detection. Preliminary experiments were performed using private data consisting of 390 genuine signatures and 1560 forged signatures. The experimental results show that the proposed algorithm improved performance in terms of verification accuracy.

Daigo Muramatsu, Kumiko Yasuda, Satoshi Shirato, Takashi Matsumoto
Separation of Overlapping and Touching Lines within Handwritten Arabic Documents

In this paper, we propose an approach for the separation of overlapping and touching lines within handwritten Arabic documents. Our approach is based on the morphology analysis of the terminal letters of Arabic words. Starting from 4 categories of possible endings, we use the angular variance to follow the connection and separate the endings. The proposed separation scheme has been evaluated on 100 documents contains 640 overlapping and touching occurrences reaching an accuracy of about 96.88%.

Nazih Ouwayed, Abdel Belaïd
Combining Contour Based Orientation and Curvature Features for Writer Recognition

This paper presents an effective method for writer recognition in handwritten documents. We have introduced a set of features that are extracted from two different representations of the contours of handwritten images. These features mainly capture the orientation and curvature information at different levels of observation, first from the chain code sequence of the contours and then from a set of polygons approximating these contours. Two writings are then compared by computing the distances between their respective features. The system trained and tested on a data set of 650 writers exhibited promising results on writer identification and verification.

Imran Siddiqi, Nicole Vincent

Features

A Novel Approach to Estimate Fractal Dimension from Closed Curves

An important point in pattern recognition and image analysis is the study of properties of the shapes used to represent an object in an image. Particularly, an interesting measure of a shape is its level of complexity, a value that can be obtained from its fractal dimension. Many methods were developed for estimating the fractal dimensions of shapes but none of these are efficient for every situation. This work proposes a novel approach to estimate the fractal dimension from shape contour by using Curvature Scale Space (CSS). Efficiency of the technique in comparison to the well-known method of Bouligand-Minkowski. Results show that the use of CSS yields fractal dimension values robust to several shape transformations (such as rotation, scale and presence of noise), so providing interesting results for a process of classification of shapes based on this measure.

André R. Backes, João B. Florindo, Odemir M. Bruno
Saliency Based on Decorrelation and Distinctiveness of Local Responses

In this paper we validate a new model of bottom-up saliency based in the decorrelation and the distinctiveness of local responses. The model is simple and light, and is based on biologically plausible mechanisms. Decorrelation is achieved by applying principal components analysis over a set of multiscale low level features. Distinctiveness is measured using the Hotelling’s T

2

statistic. The presented approach provides a suitable framework for the incorporation of top-down processes like contextual priors, but also learning and recognition. We show its capability of reproducing human fixations on an open access image dataset and we compare it with other recently proposed models of the state of the art.

Antón Garcia-Diaz, Xosé R. Fdez-Vidal, Xosé M. Pardo, Raquel Dosil
Incorporating Shape Features in an Appearance-Based Object Detection System

Most object detection techniques discussed in the literature are based solely on texture-based features that capture the global or local appearance of an object. While results indicate their ability to effectively represent an object class, these features can be detected repeatably only in the object interior, and so cannot effectively exploit the powerful recognition cue of contour. Since generic object classes can be characterized by shape and appearance, this paper has formulated a method to combine these attributes to enhance the object model. We present an approach for incorporating the recently introduced shape-based features called k-Adjacent-Segments (kAS) in our appearance-based framework based on

dense

SIFT features. Class-specific kAS features are detected in an arbitrary image to form a shape map that is then employed in two novel ways to augment the appearance-based technique. This is shown to improve the detection performance for all classes in the challenging 3D dataset by 3-18% and the PASCAL VOC 2006 by 5%.

Gurman Gill, Martin Levine
Symmetry Detection for Multi-object Using Local Polar Coordinate

In this paper, a novel method is presented, which detects symmetry axes for multi-object. It uses vertical line detection method in local polar coordinate. The approach costs little computation and can get efficient results, which means that it can be used in database applications.

Yuanhao Gong, Qicong Wang, Chenhui Yang, Yahui Gao, Cuihua Li
Detection of Non-convex Objects by Dynamic Programming

In this work we present the

Rack

algorithm for the detection of optimal non-convex contours in an image. It represents a combination of an user-driven image transformation and dynamic programming. The goal is to detect a closed contour in a scene based on the image’s edge strength. For this, we introduce a graph construction technique based on a “rack” and derive the image as a directed acyclic graph (DAG). In this graph, the shortest path with respect to an adequate cost function can be calculated efficiently via dynamic programming. Results demonstrate that this approach works well for a certain range of images and has big potential for most other images.

Andree Große, Kai Rothaus, Xiaoyi Jiang
Finding Intrinsic and Extrinsic Viewing Parameters from a Single Realist Painting

In this paper we studied the geometry of a three-dimensional tableau from a single realist painting – Scott Fraser’s

Three way vanitas

(2006). The tableau contains a carefully chosen complex arrangement of objects including a moth, egg, cup, and strand of string, glass of water, bone, and hand mirror. Each of the three plane mirrors presents a different view of the tableau from a virtual camera behind each mirror and symmetric to the artist’s viewing point. Our new contribution was to incorporate single-view geometric information extracted from the direct image of the wooden mirror frames in order to obtain the camera models of both the real camera and the three virtual cameras. Both the intrinsic and extrinsic parameters are estimated for the direct image and the images in three plane mirrors depicted within the painting.

Tadeusz Jordan, David G. Stork, Wai L. Khoo, Zhigang Zhu
A Model for Saliency Detection Using NMFsc Algorithm

Saliency mechanism has been considered crucial in the human visual system and helpful to object detection and recognition. This paper addresses an information theoretic model for visual saliency detection. It consists of two steps: first, using the Non-negative Matrix Factorization with sparseness constraints (NMFsc) algorithm to learn the basis functions from a set of randomly sampled natural image patches; and then, applying information theoretic principle to generate the saliency map by the Salient Information (SI) which is calculated from the coefficients represented by basis functions. We compare our model with the previous methods on natural images. Experimental results show that our model performs better than existing approaches.

Jian Liu, Yuncai Liu
Directional Force Field-Based Maps: Implementation and Application

A directional relationship (e.g., right, above) to a reference object can be modeled by a directional map – an image where the value of each point represents how well the relationship holds between the point and the object. As we showed in previous work, such a map can be derived from a force field created by the object (which is seen as a physical entity). This force field-based model, defined by equations in the continuous domain, shows unique characteristics. However, the approximation algorithms that were proposed in the case of 2-D raster data lack efficiency and accuracy. We introduce here new algorithms that correct this flaw, and we illustrate the potential of the force field-based approach through an application to scene matching.

JingBo Ni, Melanie Veltman, Pascal Matsakis
A Spatio-Temporal Isotropic Operator for the Attention-Point Extraction

It is proposed to extract multi-location image features at maxima points of a spatio-temporal attention operator, which indicates locations with high intensity contrast, region homogeneity, shape saliency and temporal change. The scale-adaptive estimation of local change (motion) and its aggregation with the region shape saliency contribute to robust detection of moving objects. Experiments on the accuracy of interest-point detection have proved the operator consistency and its high potential for object detection in image sequences.

Roman M. Palenichka, Marek B. Zaremba
Homological Tree-Based Strategies for Image Analysis

Homological characteristics of digital objects can be obtained in a straightforward manner computing an algebraic map

φ

over a finite cell complex

K

(with coefficients in the finite field

$\textbf{F}_2=\{0,1\}$

) which represents the digital object [9]. Computable homological information includes the Euler characteristic, homology generators and representative cycles, higher (co)homology operations, etc. This algebraic map

φ

is described in combinatorial terms using a mixed three-level forest. Different strategies changing only two parameters of this algorithm for computing

φ

are presented. Each one of those strategies gives rise to different maps, although all of them provides the same homological information for

K

. For example, tree-based structures useful in image analysis like topological skeletons and pyramids can be obtained as subgraphs of this forest.

P. Real, H. Molina-Abril, Walter Kropatsch
Affine Moment Invariants of Color Images

A new type of affine moment invariants for color images is proposed in this paper. The traditional affine moment invariants can be computed on each color channel separately, yet when the channels are transformed together, by the same affine transform, additional invariants can be computed. They have low order and therefore high robustness to noise. The new invariants are compared with another set of invariants for color images using second powers of the image function. The basic properties of the new features are tested on real images in a numerical experiment.

Tomáš Suk, Jan Flusser

Graph Representations

Graph-Based k-Means Clustering: A Comparison of the Set Median versus the Generalized Median Graph

In this paper we propose the application of the generalized median graph in a graph-based

k

-means clustering algorithm. In the graph-based

k

-means algorithm, the centers of the clusters have been traditionally represented using the set median graph. We propose an approximate method for the generalized median graph computation that allows to use it to represent the centers of the clusters. Experiments on three databases show that using the generalized median graph as the clusters representative yields better results than the set median graph.

M. Ferrer, E. Valveny, F. Serratosa, I. Bardají, H. Bunke
Algorithms for the Sample Mean of Graphs

Measures of central tendency for graphs are important for protoype construction, frequent substructure mining, and multiple alignment of protein structures. This contribution proposes subgradient-based methods for determining a sample mean of graphs. We assess the performance of the proposed algorithms in a comparative empirical study.

Brijnesh J. Jain, Klaus Obermayer
A Hypergraph-Based Model for Graph Clustering: Application to Image Indexing

In this paper, we introduce a prototype-based clustering algorithm dealing with graphs. We propose a hypergraph-based model for graph data sets by allowing clusters overlapping. More precisely, in this representation one graph can be assigned to more than one cluster. Using the concept of the graph median and a given threshold, the proposed algorithm detects automatically the number of classes in the graph database. We consider clusters as hyperedges in our hypergraph model and we define a retrieval technique indexing the database with hyperedge centroids. This model is interesting to travel the data set and efficient to cluster and retrieve graphs.

Salim Jouili, Salvatore Tabbone
Hypergraphs, Characteristic Polynomials and the Ihara Zeta Function

In this paper we make a characteristic polynomial analysis on hypergraphs for the purpose of clustering. Our starting point is the Ihara zeta function [8] which captures the cycle structure for hypergraphs. The Ihara zeta function for a hypergraph can be expressed in a determinant form as the reciprocal of the characteristic polynomial of the adjacency matrix for a transformed graph representation. Our hypergraph characterization is based on the coefficients of the characteristic polynomial, and can be used to construct feature vectors for hypergraphs. In the experimental evaluation, we demonstrate the effectiveness of the proposed characterization for clustering hypergraphs.

Peng Ren, Tatjana Aleksić, Richard C. Wilson, Edwin R. Hancock
Feature Ranking Algorithms for Improving Classification of Vector Space Embedded Graphs

Graphs provide us with a powerful and flexible representation formalism for pattern recognition. Yet, the vast majority of pattern recognition algorithms rely on vectorial data descriptions and cannot directly be applied to graphs. In order to overcome this severe limitation, an embedding of the underlying graphs in a vector space ℝ

n

is employed. The basic idea is to regard the dissimilarities of a graph

g

to a number of prototype graphs as numerical features of

g

. In previous works, the prototypes are selected beforehand with selection strategies based on some heuristics. In the present paper we take a more fundamental approach and regard the problem of prototype selection as a feature selection problem, for which many methods are available. With several experimental results we show the feasibility of graph embedding based on prototypes obtained from feature selection algorithms.

Kaspar Riesen, Horst Bunke
Graph-Based Object Class Discovery

We are interested in the problem of discovering the set of object classes present in a database of images using a weakly supervised graph-based framework. Rather than making use of the ”Bag-of-Features (BoF)” approach widely used in current work on object recognition, we represent each image by a graph using a group of selected local invariant features. Using local feature matching and iterative Procrustes alignment, we perform graph matching and compute a similarity measure. Borrowing the idea of query expansion , we develop a similarity propagation based graph clustering (SPGC) method. Using this method class specific clusters of the graphs can be obtained. Such a cluster can be generally represented by using a higher level graph model whose vertices are the clustered graphs, and the edge weights are determined by the pairwise similarity measure. Experiments are performed on a dataset, in which the number of images increases from 1 to 50K and the number of objects increases from 1 to over 500. Some objects have been discovered with total recall and a precision 1 in a single cluster.

Shengping Xia, Edwin R. Hancock

Image Processing

The Clifford-Hodge Flow: An Extension of the Beltrami Flow

In this paper, we make use of the theory of Clifford algebras for anisotropic smoothing of vector-valued data. It provides a common framework to smooth functions, tangent vector fields and mappings taking values in

$\mathfrak{so}(m)$

, the Lie algebra of SO(

m

), defined on surfaces and more generally on Riemannian manifolds. Smoothing process arises from a convolution with a kernel associated to a second order differential operator: the Hodge Laplacian. It generalizes the Beltrami flow in the sense that the Laplace-Beltrami operator is the restriction to functions of minus the Hodge operator. We obtain a common framework for anisotropic smoothing of images, vector fields and oriented orthonormal frame fields defined on the charts.

Thomas Batard, Michel Berthier
Speedup of Color Palette Indexing in Self–Organization of Kohonen Feature Map

Based on the self–organization of Kohonen feature map (SOFM), recently, Pei

et al.

presented an efficient color palette indexing method to construct a color table for compression. Taking their palette indexing method as a representative, this paper presents two new strategies, the pruning–based search strategy and the lookup table (LUT)–based update strategy, to speed up the learning process in the SOFM. Based on four typical testing images, experimental results demonstrate that our proposed two strategies have 35% execution–time improvement ratio in average. In fact, our proposed two strategies could be used to speed up the other SOFM–based learning processes in different applications.

Kuo-Liang Chung, Jyun-Pin Wang, Ming-Shao Cheng, Yong-Huai Huang
Probabilistic Satellite Image Fusion

Remote sensing satellite images play an important role in many applications such as environment and agriculture lands monitoring. In such images the scene is usually observed with different modalities, e.g. wavelengths. Image Fusion is an important analysis tool that summarizes the available information in a unique composite image. This paper proposes a new transform domain image fusion (IF) algorithm based on a hierarchical vector hidden Markov model (HHMM) and the mixture of probabilistic principal component analysers. Results on real Landsat images, quantified subjectively and using objective measures, are very satisfactory.

Farid Flitti, Mohammed Bennamoun, Du Huynh, Amine Bermak, Christophe Collet
A Riemannian Scalar Measure for Diffusion Tensor Images

We study a well-known scalar quantity in differential geometry, the Ricci scalar, in the context of Diffusion Tensor Imaging (DTI). We explore the relation between the Ricci scalar and the two most popular scalar measures in DTI: Mean Diffusivity and Fractional Anisotropy. We discuss results of computing the Ricci scalar on synthetic as well as real DTI data.

Andrea Fuster, Laura Astola, Luc Florack
Structure-Preserving Smoothing of Biomedical Images

Smoothing of biomedical images should preserve gray-level transitions between adjacent tissues, while restoring contours consistent with anatomical structures. Anisotropic diffusion operators are based on image appearance discontinuities (either local or contextual) and might fail at weak inter-tissue transitions. Meanwhile, the output of block-wise and morphological operations is prone to present a block structure due to the shape and size of the considered pixel neighborhood.

In this contribution, we use differential geometry concepts to define a diffusion operator that restricts to image consistent level-sets. In this manner, the final state is a non-uniform intensity image presenting homogeneous inter-tissue transitions along anatomical structures, while smoothing intra-structure texture. Experiments on different types of medical images (magnetic resonance, computerized tomography) illustrate its benefit on a further process (such as segmentation) of images.

Debora Gil, Aura Hernàndez-Sabaté, Mireia Burnat, Steven Jansen, Jordi Martínez-Villalta
Fast Block Clustering Based Optimized Adaptive Mediod Shift

We present an optimal approach to unsupervised color image clustering, suited for high resolution images based on mode seeking by mediod shifts. It is shown that automatic detection of total number of clusters depends upon overall image statistics as well as the bandwidth of the underlying probability density function. An optimized adaptive mode seeking algorithm based on reverse parallel tree traversal is proposed. This work has contribution in three aspects. 1) Adaptive bandwidth for kernel function is proposed based on the overall image statistics; 2) A novel reverse parallel tree traversing approach for mode seeking is presented which drastically reduces number of computational steps as compared to traditional tree traversing. 3) For high resolution images block clustering based optimized Adaptive Mediod Shift (AMS) is proposed where mode seeking is done in blocks and then the local modes are merged globally. The proposed method has made it possible to perform clustering on variety of high resolution images. Experimental results have shown our algorithm time efficient and robust.

Zulqarnain Gilani, Naveed Iqbal Rao
Color Me Right–Seamless Image Compositing

This paper introduces an approach of creating an image composite by seamlessly blending a region of interest from an image onto another while faithfully preserving the color of regions specified by user markup. With different regions marked for color-preserving, our approach provides users the flexibility in creating different composites. The experiment results demonstrate the effectiveness of the proposed approach in creating seamless image composite with color preserved.

Dong Guo, Terence Sim
Transform Invariant Video Fingerprinting by NMF

Video fingerprinting is introduced as an effective tool for identification and recognition of video content even after putative modifications. In this paper, we present a video fingerprinting scheme based on non-negative matrix factorization (NMF). NMF is shown to be capable of generating discriminative, parts-based representations while reducing the dimensionality of the data. NMF’s representation capacity can be fortified by incorporating geometric transformational duplicates of the base vectors into the factorization. Factorized base vectors are used as content based, representative features that uniquely describe the video content. Obtaining such base vectors by transformational NMF (T-NMF) is furthermore versatile in recognizing the attacked contents as copies of the original instead of considering them as a new content. Thus a novel approach for fingerprinting of video content based on T-NMF is introduced in this work and experimental results obtained on TRECVID data set are presented to demonstrate the robustness to geometric attacks and the improvement in the representation.

Ozan Gursoy, Bilge Gunsel, Neslihan Sengor
A Model Based Method for Overall Well Focused Catadioptric Image Acquisition with Multi-focal Images

Based on an analysis on the spatial distribution property of virtual features, we propose that the shapes of the best focused image regions in multi-focal catadioptric images can be modeled by a series of neighboring concentric annuluses. Based on this model, an over-all well focused image can be obtained by combining the best focused regions from a set of multi-focal images in a fast and reliable manner. A robust algorithm for estimating the model parameters is presented. Experiments with real catadioptric images under a variety of scenes verify the validity of the model and the robust performance of the algorithm.

Weiming Li, Youfu Li, Yihong Wu
Colorization Using Segmentation with Random Walk

Traditional monochrome image colorization techniques require considerable user interaction and a lot of time. The segment-based colorization works fast but at the expense of detail loss because of the large segmentation; while the optimization based method looks much more continuous but takes longer time. This paper proposed a novel approach: Segmentation colorization based on random walks, which is a fast segmentation technique and can naturally handle multi-label segmentation problems. It can maintain smoothness almost everywhere except for the sharp discontinuity at the boundaries in the images. Firstly, with the few seeds of pixels set manually scribbled by the user, a global energy is set up according to the spatial information and statistical grayscale information. Then, with random walks, the global optimal segmentation is obtained fast and efficiently. Finally, a banded graph cut based refine procedure is applied to deal with ambiguous regions of the previous segmentation. Several results are shown to demonstrate the effectiveness of the proposed method.

Xiaoming Liu, Jun Liu, Zhilin Feng
Edge-Based Image Compression with Homogeneous Diffusion

It is well-known that edges contain semantically important image information. In this paper we present a lossy compression method for cartoon-like images that exploits information at image edges. These edges are extracted with the Marr–Hildreth operator followed by hysteresis thresholding. Their locations are stored in a lossless way using JBIG. Moreover, we encode the grey or colour values at both sides of each edge by applying quantisation, subsampling and PAQ coding. In the decoding step, information outside these encoded data is recovered by solving the Laplace equation, i.e. we inpaint with the steady state of a homogeneous diffusion process. Our experiments show that the suggested method outperforms the widely-used JPEG standard and can even beat the advanced JPEG2000 standard for cartoon-like images.

Markus Mainberger, Joachim Weickert
Color Quantization Based on PCA and Kohonen SOFM

A method for initializing optimally Kohonen’s Self-Organizing Feature Maps (SOFM) of a fixed zero neighborhood radius for use in color quantization is presented. Standard SOFM is applied to the projection of the input image pixels onto the plane spanned by the two largest principal components and to pixels of the original image defined by the smallest principal component via a thresholding procedure. The neuron values which emerge initialize the final SOFM of a fixed zero neighborhood radius that performs the color quantization of the original image. Experimental results show that the proposed method is able to produce smaller quantization errors than standard SOFM and other existing color quantization methods.

D. Mavridis, N. Papamarkos
On Adapting the Tensor Voting Framework to Robust Color Image Denoising

This paper presents an adaptation of the tensor voting framework for color image denoising, while preserving edges. Tensors are used in order to encode the CIELAB color channels, the uniformity and the edginess of image pixels. A specific voting process is proposed in order to propagate color from a pixel to its neighbors by considering the distance between pixels, the perceptual color difference (by using an optimized version of CIEDE2000), a uniformity measurement and the likelihood of the pixels being impulse noise. The original colors are corrected with those encoded by the tensors obtained after the voting process. Peak to noise ratios and visual inspection show that the proposed methodology has a better performance than state-of-the-art techniques.

Rodrigo Moreno, Miguel Angel Garcia, Domenec Puig, Carme Julià
Two-Dimensional Windowing in the Structural Similarity Index for the Colour Image Quality Assessment

This paper presents the analysis of the usage of the Structural Similarity (SSIM) index for the quality assessment of the colour images with variable size of the sliding window. The experiments have been performed using the LIVE Image Quality Assessment Database in order to compare the linear correlation of achieved results with the Differential Mean Opinion Score (DMOS) values. The calculations have been done using the value (brightness) channel from the HSV (HSB) colour space as well as commonly used YUV/YIQ luminance channel and the average of the RGB channels. The analysis of the image resolution’s influence on the correlation between the SSIM and DMOS values for varying size of the sliding window is also presented as well as some results obtained using the nonlinear mapping based on the logistic function.

Krzysztof Okarma
Reduced Inverse Distance Weighting Interpolation for Painterly Rendering

The interpolation problem of irregularly distributed data in a multidimensional domain is considered. A modification of the inverse distance weighting interpolation formula is proposed, making computation time independent of the number of data points. Only the first

K

neighbors of a given point are considered, instead of the entire dataset. Additional factors are introduced, preventing discontinuities on points where the set of local neighbors changes. Theoretical analysis provides conditions which guarantee continuity. The proposed approach is efficient and free from magic numbers. Unlike many existing algorithms based on the

k

-nearest neighbors, the number of neighbors is derived from theoretical principles. The method has been applied to the problem of vector field generation in the context of artistic imaging. Experimental results show its ability to produce brush strokes oriented along object contours and to effectively render meaningful texture details.

Giuseppe Papari, Nicolai Petkov
Nonlinear Diffusion Filters without Parameters for Image Segmentation

Nonlinear diffusion filtering seeks to improve images qualitatively by removing noise while preserving details and even enhancing edges. However, well known implementations are sensitive to parameters which are necessarily tuned to sharpen a narrow range of edge slopes. In this work, we have selected a nonlinear diffusion filter without control parameters. It has been guided searching the optimum balance between time performance and resulting quality suitable for automatic segmentation tasks. Using a semi-implicit numerical scheme, we have determined the relationship between the slope range to sharpen and the diffusion time. It has also been selected the diffusivity with optimum performances. Several diffusion filters have been applied to noisy computed tomography images and evaluated for their suitability to the medical image segmentation. Experimental results show that our proposal of filter performs quite well in relation to others.

Carlos Platero, Javier Sanguino, Olga Velasco
Color Quantization by Multiresolution Analysis

A color quantization method is presented, which is based on the analysis of the histogram at different resolutions computed on a Gaussian pyramid of the input image. Criteria based on persistence and dominance of peaks and pits of the histograms are introduced to detect the modes in the histogram of the input image and to define the reduced colormap. Important features of the method are, besides its limited computational cost, the possibility to obtain quantized images with a variable number of colors, depending on the user’s need, and that the number of colors in the resulting image does not need to be a priori fixed.

Giuliana Ramella, Gabriella Sanniti di Baja
Total Variation Processing of Images with Poisson Statistics

This paper deals with denoising of density images with bad Poisson statistics (low count rates), where the reconstruction of the major structures seems the only reasonable task. Obtaining the structures with sharp edges can also be a prerequisite for further processing, e.g. segmentation of objects.

A variety of approaches exists in the case of Gaussian noise, but only a few in the Poisson case. We propose some total variation (TV) based regularization techniques adapted to the case of Poisson data, which we derive from approximations of logarithmic a-posteriori probabilities. In order to guarantee sharp edges we avoid the smoothing of the total variation and use a dual approach for the numerical solution. We illustrate and test the feasibility of our approaches for data in positron emission tomography, namely reconstructions of cardiac structures with

18

F-FDG and H

$_2 \, ^{15}$

O tracers, respectively.

Alex Sawatzky, Christoph Brune, Jahn Müller, Martin Burger
Fast Trilateral Filtering

This paper compares the original implementation of the trilateral filter with two proposed speed improvements. One is using simple look-up-tables, and leads to exactly the same results as the original filter. The other technique is using a novel way of truncating the look-up-table (LUT) to a user specified required accuracy. Here, results differ from those of the original filter, but to a very minor extent. The paper shows that measured speed improvements of this second technique are in the order of several magnitudes, compared to the original or LUT trilateral filter.

Tobi Vaudrey, Reinhard Klette

Image Registration

Joint Affine and Radiometric Registration Using Kernel Operators

A new global method for image registration in the presence of affine and radiometric deformations is introduced. The method proposed utilizes kernel operators in order to find corresponding regions without using local features. Application of polynomial type kernel functions results in a low complexity algorithm, allowing estimation of the radiometric deformation regardless of the affine geometric transformation. Preliminary experimentation shows high registration accuracy for the joint task, given real images with varying illuminations.

Boaz Vigdor, Joseph M. Francos
MCMC-Based Algorithm to Adjust Scale Bias in Large Series of Electron Microscopical Ultrathin Sections

When using a non-rigid registration scheme, it is possible that bias is introduced during the registration process of consecutive sections. This bias can accumulate when large series of sections are to be registered and can cause substantial distortions of the scale space of individual sections thus leading to significant measurement bias. This paper presents an automated scheme based on Markov Chain Monte Carlo (MCMC) techniques to estimate and eliminate registration bias. For this purpose, a hierarchical model is used based on the assumption that (a) each section has the same, independent probability to be deformed by the sectioning and therefore the subsequent registration process and (b) the varying bias introduced by the registration process has to be balanced such that the average section area is preserved forcing the average scale parameters to have a mean value of 1.0.

Huaizhong Zhang, E. Patricia Rodriguez, Philip Morrow, Sally McClean, Kurt Saetzler

Image and Video Retrieval

Accelerating Image Retrieval Using Factorial Correspondence Analysis on GPU

We are interested in the intensive use of Factorial Correspondence Analysis (FCA) for large-scale content-based image retrieval. Factorial Correspondence Analysis, is a useful method for analyzing textual data, and we adapt it to images using the SIFT local descriptors. FCA is used to reduce dimensions and to limit the number of images to be considered during the search. Graphics Processing Units (GPU) are fast emerging as inexpensive parallel processors due to their high computation power and low price. The G80 family of Nvidia GPUs provides the CUDA programming model that treats the GPU as a SIMD processor array. We present two very fast algorithms on GPU for image retrieval using FCA: the first one is a parallel incremental algorithm for FCA and the second one is an extension of the filtering algorithm in our previous work for filtering step.

Our implementation is able to scale up the FCA computation a factor of 30 compared to the CPU version. For retrieval tasks, the parallel version on GPU performs 10 times faster than the one on CPU. Retrieving images in a database of 1 million images is done in about 8 milliseconds.

Nguyen-Khang Pham, Annie Morin, Patrick Gros
Color Based Bags-of-Emotions

In this paper we describe how to include high level semantic information, such as aesthetics and emotions, into Content Based Image Retrieval. We present a color-based emotion-related image descriptor that can be used for describing the emotional content of images. The color emotion metric used is derived from psychophysical experiments and based on three variables:

activity

,

weight

and

heat

. It was originally designed for single-colors, but recent research has shown that the same emotion estimates can be applied in the retrieval of multi-colored images. Here we describe a new approach, based on the assumption that perceived color emotions in images are mainly affected by homogenous regions, defined by the emotion metric, and transitions between regions. RGB coordinates are converted to emotion coordinates, and for each emotion channel, statistical measurements of gradient magnitudes within a stack of low-pass filtered images are used for finding interest points corresponding to homogeneous regions and transitions between regions. Emotion characteristics are derived for patches surrounding each interest point, and saved in a

bag-of-emotions

, that, for instance, can be used for retrieving images based on emotional content.

Martin Solli, Reiner Lenz
Measuring the Influence of Concept Detection on Video Retrieval

There is an increasing emphasis on including semantic concept detection as part of video retrieval. This represents a modality for retrieval quite different from metadata-based and keyframe similarity-based approaches. One of the premises on which the success of this is based, is that good quality detection is available in order to guarantee retrieval quality. But how good does the feature detection actually need to be? Is it possible to achieve good retrieval quality, even with poor quality concept detection and if so then what is the “tipping point” below which detection accuracy proves not to be beneficial? In this paper we explore this question using a collection of rushes video where we artificially vary the quality of detection of semantic features and we study the impact on the resulting retrieval. Our results show that the impact of improving or degrading performance of concept detectors is not directly reflected as retrieval performance and this raises interesting questions about how accurate concept detection really needs to be.

Pablo Toharia, Oscar D. Robles, Alan F. Smeaton, Ángel Rodríguez

Medical Imaging

SEM Image Analysis for Quality Control of Nanoparticles

In nano-medicine, mesoporous silicon particles provide efficient vehicles for the dissemination and delivery of key proteins at the micron scale. We propose a new quality-control method for the nanopore structure of these particles, based on image analysis software developed to automatically inspect scanning electronic microscopy (SEM) images of nanoparticles in a fully automated fashion. Our algorithm first identifies the precise position and shape of each nanopore, then generates a graphic display of these nanopores and of their boundaries. This is essentially a texture segmentation task, and a key quality-control requirement is fast computing speed. Our software then computes key shape characteristics of individual nanopores, such as area, outer diameter, eccentricity, etc., and then generates means, standard deviations, and histograms of each pore-shape feature. Thus, the image analysis algorithms automatically produce a vector from each image which contains relevant nanoparticle quality control characteristics, either for comparison to pre-established acceptability thresholds, or for the analysis of homogeneity and the detection of outliers among families of nanoparticles.

S. K. Alexander, R. Azencott, B. G. Bodmann, A. Bouamrani, C. Chiappini, M. Ferrari, X. Liu, E. Tasciotti
Extraction of Cardiac Motion Using Scale-Space Features Points and Gauged Reconstruction

Motion estimation is an important topic in medical image analysis. The investigation and quantification of, e.g., the cardiac movement is important for assessment of cardiac abnormalities and to get an indication of response to therapy. In this paper we present a new aperture problem-free method to track cardiac motion from 2-dimensional MR tagged images and corresponding sine-phase images. Tracking is achieved by following the movement of scale-space critical points such as maxima, minima and saddles. Reconstruction of dense velocity field is carried out by minimizing an energy functional with regularization term influenced by covariant derivatives gauged by a prior assumption.

MR tags deform along with the tissue, a combination of MR tagged images and sine-phase images was employed to produce a regular grid from which the scale-space critical points were retrieved. Experiments were carried out on real image data, and on artificial phantom data from which the ground truth is known. A comparison between our new method and a similar technique based on homogeneous diffusion regularization and standard derivatives shows increase in performance. Qualitative and quantitative evaluation emphasize the reliability of dense motion field allowing further analysis of deformation and torsion of the cardiac wall.

Alessandro Becciu, Bart J. Janssen, Hans van Assen, Luc Florack, Vivian Roode, Bart M. ter Haar Romeny
A Non-Local Fuzzy Segmentation Method: Application to Brain MRI

The Fuzzy C-Means algorithm is a widely used and flexible approach for brain tissue segmentation from 3D MRI. Despite its recent enrichment by addition of a spatial dependency to its formulation, it remains quite sensitive to noise. In order to improve its reliability in noisy contexts, we propose a way to select the most suitable example regions for regularisation. This approach inspired by the Non-Local Mean strategy used in image restoration is based on the computation of weights modelling the grey-level similarity between the neighbourhoods being compared. Experiments were performed on MRI data and results illustrate the usefulness of the approach in the context of brain tissue classification.

Benoît Caldairou, François Rousseau, Nicolas Passat, Piotr Habas, Colin Studholme, Christian Heinrich
Development of a High Resolution 3D Infant Stomach Model for Surgical Planning

Medical surgical procedures have not changed much during the past century due to the lack of accurate low-cost workbench for testing any new improvement. The increasingly cheaper and powerful computer technologies have made computer-based surgery planning and training feasible. In our work, we have developed an accurate 3D stomach model, which aims to improve the surgical procedure that treats the infant pediatric and neonatal gastro-esophageal reflux disease (GERD). We generate the 3-D infant stomach model based on in vivo computer tomography (CT) scans of an infant. CT is a widely used clinical imaging modality that is cheap, but with low spatial resolution. To improve the model accuracy, we use the high resolution Visible Human Project (VHP) in model building. Next, we add soft muscle material properties to make the 3D model deformable. Then we use virtual reality techniques such as haptic devices to make the 3D stomach model deform upon touching force. This accurate 3D stomach model provides a workbench for testing new GERD treatment surgical procedures. It has the potential to reduce or eliminate the extensive cost associated with animal testing when improving any surgical procedure, and ultimately, to reduce the risk associated with infant GERD surgery.

Qaiser Chaudry, S. Hussain Raza, Jeonggyu Lee, Yan Xu, Mark Wulkan, May D. Wang
Improved Arterial Inner Wall Detection Using Generalized Median Computation

In this paper, we propose a novel method for automatic detection of the lumen diameter and intima-media thickness from dynamic B-mode sonographic image sequences with and without plaques. There are two phases in this algorithm. In the first phase a dual dynamic programming (DDP) is applied to detect the far wall IMT and near wall IMT. The general median curves are then calculated. In the second phase, the DDP is applied again using the median curves as the knowledge to obtain a more informed search and to potentially correct errors from the first phase. All results are visually controlled by professional physicians. Based on our experiments, this system can replace the experts’ manual work, which is time-consuming and not repeatable.

Da-Chuan Cheng, Arno Schmidt-Trucksäss, Shing-Hong Liu, Xiaoyi Jiang
Parcellation of the Auditory Cortex into Landmark–Related Regions of Interest

We propose a method for the automated delineation of cortical regions of interest as a basis for the anatomo–functional parcellation of the human auditory cortex using neuroimaging. Our algorithm uses the properties of the cortical surface, and employs a recent hierarchical part–based pattern recognition strategy for a semantically correct labelling of the temporal lobe. The anatomical landmarks are finally combined to obtain an accurate separation and parametrisation of two auditory cortical regions. Experimental results show the good performance of the approach that was automated using simplified atlas information.

Karin Engel, Klaus Tönnies, André Brechmann
Automatic Fontanel Extraction from Newborns’ CT Images Using Variational Level Set

A realistic head model is needed for source localization methods used for the study of epilepsy in neonates applying Electroencephalographic (EEG) measurements from the scalp. The earliest models consider the head as a series of concentric spheres, each layer corresponding to a different tissue whose conductivity is assumed to be homogeneous. The results of the source reconstruction depend highly on the electric conductivities of the tissues forming the head.The most used model is constituted of three layers (scalp, skull, and intracranial). Most of the major bones of the neonates’ skull are ossified at birth but can slightly move relative to each other. This is due to the sutures, fibrous membranes that at this stage of development connect the already ossified flat bones of the neurocranium. These weak parts of the neurocranium are called fontanels. Thus it is important to enter the exact geometry of fontaneles and flat bone in a source reconstruction because they show pronounced in conductivity. Computer Tomography (CT) imaging provides an excellent tool for non-invasive investigation of the skull which expresses itself in high contrast to all other tissues while the fontanels only can be identified as absence of bone, gaps in the skull formed by flat bone. Therefore, the aim of this paper is to extract the fontanels from CT images applying a variational level set method. We applied the proposed method to CT-images of five different subjects. The automatically extracted fontanels show good agreement with the manually extracted ones.

Kamran Kazemi, Sona Ghadimi, Alireza Lyaghat, Alla Tarighati, Narjes Golshaeyan, Hamid Abrishami-Moghaddam, Reinhard Grebe, Catherine Gondary-Jouet, Fabrice Wallois
Modeling and Measurement of 3D Deformation of Scoliotic Spine Using 2D X-ray Images

Scoliosis causes deformations such as twisting and lateral bending of the spine. To correct scoliotic deformation, the extents of 3D spinal deformation need to be measured. This paper studies the modeling and measurement of scoliotic spine based on 3D curve model. Through modeling the spine as a 3D Cosserat rod, the 3D structure of a scoliotic spine can be recovered by obtaining the minimum potential energy registration of the rod to the scoliotic spine in the x-ray image. Test results show that it is possible to obtain accurate 3D reconstruction using only the landmarks in a single view, provided that appropriate boundary conditions and elastic properties are included as constraints.

Hao Li, Wee Kheng Leow, Chao-Hui Huang, Tet Sen Howe
A Comparative Study on Feature Selection for Retinal Vessel Segmentation Using FABC

This paper presents a comparative study on five feature selection heuristics applied to a retinal image database called DRIVE. Features are chosen from a feature vector (encoding local information, but as well information from structures and shapes available in the image) constructed for each pixel in the field of view (FOV) of the image. After selecting the most discriminatory features, an AdaBoost classifier is applied for training. The results of classifications are used to compare the effectiveness of the five feature selection methods.

Carmen Alina Lupaşcu, Domenico Tegolo, Emanuele Trucco
Directional Multi-scale Modeling of High-Resolution Computed Tomography (HRCT) Lung Images for Diffuse Lung Disease Classification

A directional multi-scale modeling scheme based on wavelet and contourlet transforms is employed to describe HRCT lung image textures for classifying four diffuse lung disease patterns: normal, emphysema, ground glass opacity (GGO) and honey-combing. Generalized Gaussian density parameters are used to represent the detail sub-band features obtained by wavelet and contourlet transforms. In addition, support vector machines (SVMs) with excellent performance in a variety of pattern classification problems are used as classifier. The method is tested on a collection of 89 slices from 38 patients, each slice of size 512x512, 16 bits/pixel in DICOM format. The dataset contains 70,000 ROIs of those slices marked by experienced radiologists. We employ this technique at different wavelet and contourlet transform scales for diffuse lung disease classification. The technique presented here has best overall sensitivity 93.40% and specificity 98.40%.

Kiet T. Vo, Arcot Sowmya
Statistical Deformable Model-Based Reconstruction of a Patient-Specific Surface Model from Single Standard X-ray Radiograph

In this paper, we present a hybrid 2D-3D deformable registration strategy combining a landmark-to-ray registration with a statistical shape model-based 2D-3D reconstruction scheme, and show its application to reconstruct a patient-specific 3D surface model of the pelvis from single standard X-ray radiograph. The landmark-to-ray registration is used to find an initial scale and an initial rigid transformation between the X-ray image and the statistical shape model. The estimated scale and rigid transformation are then used to initialize the statistical shape model-based 2D-3D reconstruction scheme, which combines statistical instantiation and regularized shape deformation with an iterative image-to-model correspondence establishing algorithm. Quantitative and qualitative results of a feasibility study on clinical and cadaveric datasets are given, which indicate the validity of our approach.

Guoyan Zheng

Object and Scene Recognition

Plant Species Identification Using Multi-scale Fractal Dimension Applied to Images of Adaxial Surface Epidermis

This paper presents the study of computational methods applied to histological texture analysis in order to identify plant species, a very difficult task due to the great similarity among some species and presence of irregularities in a given species. Experiments were performed considering 300 ×300 texture windows extracted from adaxial surface epidermis from eight species. Different texture methods were evaluated using Linear Discriminant Analysis (LDA). Results showed that methods based on complexity analysis perform a better texture discrimination, so conducting to a more accurate identification of plant species.

André R. Backes, Jarbas J. de M. Sá Junior, Rosana M. Kolb, Odemir M. Bruno
Fast Invariant Contour-Based Classification of Hand Symbols for HCI

Video-based recognition of hand symbols is a promising technology for designing new interaction techniques for multi-user environments of the future. However, most approaches still lack performance for direct application for human-computer interaction (HCI).

In this paper we propose a novel approach to contour-based recognition of hand symbols for HCI. We present adequate methods for normalization and representation of signatures extracted from boundary contours, which allow for efficient recognition of hand poses invariant to translation, rotation, scale and viewpoint variations, which are relevant for many applications in HCI. The developed classification system is evaluated on a dataset containing 13 hand symbols captured from four different persons.

Thomas Bader, René Räpple, Jürgen Beyerer
Recognition of Simple 3D Geometrical Objects under Partial Occlusion

In this paper we present a novel procedure for contour-based recognition of partially occluded three-dimensional objects. In our approach we use images of real and rendered objects whose contours have been deformed by a restricted change of the viewpoint. The preparatory part consists of contour extraction, preprocessing, local structure analysis and feature extraction. The main part deals with an extended construction and functionality of the classifier ensemble Adaptive Occlusion Classifier (AOC). It relies on a

hierarchical fragmenting

algorithm to perform a local structure analysis which is essential when dealing with occlusions. In the experimental part of this paper we present classification results for five classes of simple geometrical figures: prism, cylinder, half cylinder, a cube, and a bridge. We compare classification results for three classical feature extractors: Fourier descriptors, pseudo Zernike and Zernike moments.

Alexandra Barchunova, Gerald Sommer
Shape Classification Using a Flexible Graph Kernel

The medial axis being an homotopic transformation, the skeleton of a 2D shape corresponds to a planar graph having one face for each hole of the shape and one node for each junction or extremity of the branches. This graph is non simple since it can be composed of loops and multiple-edges. Within the shape comparison framework, such a graph is usually transformed into a simpler structure such as a tree or a simple graph hereby loosing major information about the shape. In this paper, we propose a graph kernel combining a kernel between bags of trails and a kernel between faces. The trails are defined within the original complex graph and the kernel between trails is enforced by an edition process. The kernel between bags of faces allows to put an emphasis on the holes of the shapes and hence on their genre. The resulting graph kernel is positive semi-definite on the graph domain.

François-Xavier Dupé, Luc Brun
Bio-inspired Approach for the Recognition of Goal-Directed Hand Actions

The recognition of transitive, goal-directed actions requires a sensible balance between the representation of specific shape details of effector and goal object and robustness with respect to image transformations. We present a biologically-inspired architecture for the recognition of transitive actions from video sequences that integrates an appearance-based recognition approach with a simple neural mechanism for the representation of the effector-object relationship. A large degree of position invariance is obtained by nonlinear pooling in combination with an explicit representation of the relative positions of object and effector using neural population codes. The approach was tested on real videos, demonstrating successful invariant recognition of grip types on unsegmented video sequences. In addition, the algorithm reproduces and predicts the behavior of action-selective neurons in parietal and prefrontal cortex.

Falk Fleischer, Antonino Casile, Martin A. Giese
Wide-Baseline Visible Features for Highly Dynamic Scene Recognition

This paper describes a new visual feature to especially address the problem of highly dynamic place recognition. The feature is obtained by identifying existing local features, such as SIFT or SURF, that have wide baseline visibility within the place. These identified local features are then compressed into a single representative feature, a wide-baseline visible feature, which is computed as an average of all the features associated with it. The proposed feature is especially robust against highly dynamical changes in scene; it can be correctly matched against a number of features collected from many dynamic images. This paper also describes an approach to using these features for scene recognition. The recognition proceeds by matching individual feature to a set of features from testing images, followed by majority voting to identify a place with the highest matched features. The proposed feature is trained and tested on 2000+ outdoor omnidirectional. Despite its simplicity, wide-baseline visible feature offers two times better rate of recognition (

ca.

93%) than other features. The number of features can be further reduced to speed up the time without dropping in accuracy, which makes it more suitable to long-term scene recognition and localization.

Aram Kawewong, Sirinart Tangruamsub, Osamu Hasegawa
Jumping Emerging Substrings in Image Classification

We propose a new image classification scheme based on the idea of mining jumping emerging substrings between classes of images represented by visual features. Jumping emerging substrings (JES) are string patterns, which occur frequently in one set of string data and are absent in another. By representing images in symbolic manner, according to their color and texture characteristics, we enable mining of JESs in sets of visual data and use mined patterns to create efficient and accurate classifiers. In this paper we describe our approach to image representation and provide experimental results of JES-based classification of well-known image datasets.

Łukasz Kobyliński, Krzysztof Walczak
Human Action Recognition Using LBP-TOP as Sparse Spatio-Temporal Feature Descriptor

In this paper we apply the Local Binary Pattern on Three Orthogonal Planes (LBP-TOP) descriptor to the field of human action recognition. A video sequence is described as a collection of spatial-temporal words after the detection of space-time interest points and the description of the area around them. Our contribution has been in the description part, showing LBP-TOP to be a promising descriptor for human action classification purposes. We have also developed several extensions to the descriptor to enhance its performance in human action recognition, showing the method to be computationally efficient.

Riccardo Mattivi, Ling Shao
Contextual-Guided Bag-of-Visual-Words Model for Multi-class Object Categorization

Bag-of-words model (BOW) is inspired by the text classification problem, where a document is represented by an unsorted set of contained words. Analogously, in the object categorization problem, an image is represented by an unsorted set of discrete visual words (BOVW). In these models, relations among visual words are performed after dictionary construction. However, close object regions can have far descriptions in the feature space, being grouped as different visual words. In this paper, we present a method for considering geometrical information of visual words in the dictionary construction step. Object interest regions are obtained by means of the Harris-Affine detector and then described using the SIFT descriptor. Afterward, a contextual-space and a feature-space are defined, and a merging process is used to fuse feature words based on their proximity in the contextual-space. Moreover, we use the Error Correcting Output Codes framework to learn the new dictionary in order to perform multi-class classification. Results show significant classification improvements when spatial information is taken into account in the dictionary construction step.

Mehdi Mirza-Mohammadi, Sergio Escalera, Petia Radeva
Isometric Deformation Modelling for Object Recognition

We present two methods for isometrically deformable object recognition. The methods are built upon the use of geodesic distance matrices (GDM) as an object representation. The first method compares these matrices by using histogram comparisons. The second method is a modal approach. The largest singular values or eigenvalues appear to be an excellent shape descriptor, based on the comparison with other methods also using the isometric deformation model and a general baseline algorithm. The methods are validated using the TOSCA database of non-rigid objects and a rank 1 recognition rate of 100% is reported for the modal representation method using the 50 largest eigenvalues. This is clearly higher than other methods using an isometric deformation model.

Dirk Smeets, Thomas Fabry, Jeroen Hermans, Dirk Vandermeulen, Paul Suetens
Image Categorization Based on a Hierarchical Spatial Markov Model

In this paper, we propose a Hierarchical Spatial Markov Model (HSMM) for image categorization. We adopt the Bag-of-Words (BoW) model to represent image features with visual words, thus avoiding the heavy work of manual annotation in most Markov model based approaches. Our HSMM is designed to describe the spatial relations of these visual words by modeling the distribution of transitions between adjacent words over each image category. A novel idea of semantic hierarchy is exerted in the model to represent the composition relationship of visual words at semantic level. Experiments demonstrate that our approach outperforms Bayesian hierarchical model based categorization approach with 12.5% and it also performs better than the previous Markov model based approach with 11.8% on average.

Lihua Wang, Zhiwu Lu, Horace H. S. Ip
Soft Measure of Visual Token Occurrences for Object Categorization

The improvement of bag-of-features image representation by statistical modeling of visual tokens has recently gained attention in the field of object categorization. This paper proposes a soft bag-of-features image representation based on Gaussian Mixture Modeling (GMM) of visual tokens for object categorization. The distribution of local features from each visual token is assumed as the GMM and learned from the training data by the Expectation-Maximization algorithm with a model selection method based on the Minimum Description Length. Consequently, we can employ Bayesian formula to compute posterior probabilities of being visual tokens for local features. According to these probabilities, three schemes of image representation are defined and compared for object categorization under a new discriminative learning framework of Bayesian classifiers, the Max-Min posterior Pseudo-probabilities (MMP). We evaluate the effectiveness of the proposed object categorization approach on the Caltech-4 database and car side images from the University of Illinois. The experimental results with comparisons to those reported in other related work show that our approach is promising.

Yanjie Wang, Xiabi Liu, Yunde Jia
Indexing Large Visual Vocabulary by Randomized Dimensions Hashing for High Quantization Accuracy: Improving the Object Retrieval Quality

The bag-of-visual-words approach, inspired by text retrieval methods, has proven successful in achieving high performance in object retrieval on large-scale databases. A key step of these methods is the quantization stage which maps the high-dimensional image feature vectors to discriminatory visual words. In this paper, we consider the quantization step as the nearest neighbor search in large visual vocabulary, and thus proposed a randomized dimensions hashing (RDH) algorithm to efficiently index and search the large visual vocabulary. The experimental results have demonstrated that the proposed algorithm can effectively increase the quantization accuracy compared to the vocabulary tree based methods which represent the state-of-the-art. Consequently, the object retrieval performance can be significantly improved by our method in the large-scale database.

Heng Yang, Qing Wang, Zhoucan He

Pattern Recognition

Design of Clinical Support Systems Using Integrated Genetic Algorithm and Support Vector Machine

Clinical decision support system (CDSS) provides knowledge and specific information for clinicians to enhance diagnostic efficiency and improving healthcare quality. An appropriate CDSS can highly elevate patient safety, improve healthcare quality, and increase cost-effectiveness. Support vector machine (SVM) is believed to be superior to traditional statistical and neural network classifiers. However, it is critical to determine suitable combination of SVM parameters regarding classification performance. Genetic algorithm (GA) can find optimal solution within an acceptable time, and is faster than greedy algorithm with exhaustive searching strategy. By taking the advantage of GA in quickly selecting the salient features and adjusting SVM parameters, a method using integrated GA and SVM (IGS), which is different from the traditional method with GA used for feature selection and SVM for classification, was used to design CDSSs for prediction of successful ventilation weaning, diagnosis of patients with severe obstructive sleep apnea, and discrimination of different cell types form Pap smear. The results show that IGS is better than methods using SVM alone or linear discriminator.

Yung-Fu Chen, Yung-Fa Huang, Xiaoyi Jiang, Yuan-Nian Hsu, Hsuan-Hung Lin
Decision Trees Using the Minimum Entropy-of-Error Principle

Binary decision trees based on univariate splits have traditionally employed so-called impurity functions as a means of searching for the best node splits. Such functions use estimates of the class distributions. In the present paper we introduce a new concept to binary tree design: instead of working with the class distributions of the data we work directly with the distribution of the errors originated by the node splits. Concretely, we search for the best splits using a minimum entropy-of-error (MEE) strategy. This strategy has recently been applied in other areas (e.g. regression, clustering, blind source separation, neural network training) with success. We show that MEE trees are capable of producing good results with often simpler trees, have interesting generalization properties and in the many experiments we have performed they could be used without pruning.

J. P. Marques de Sá, João Gama, Raquel Sebastião, Luís A. Alexandre
k/K-Nearest Neighborhood Criterion for Improvement of Locally Linear Embedding

Spectral manifold learning techniques have recently found extensive applications in machine vision. The common strategy of spectral algorithms for manifold learning is exploiting the local relationships in a symmetric adjacency graph, which is typically constructed using

k

-nearest neighborhood (

k

-NN) criterion. In this paper, with our focus on locally linear embedding as a powerful and well-known spectral technique, shortcomings of

k

-NN for construction of the adjacency graph are first illustrated, and then a new criterion, namely

k

/

K

-nearest neighborhood (

k

/

K

-NN) is introduced to overcome these drawbacks. The proposed criterion involves finding the sparsest representation of each sample in the dataset, and is realized by modifying Robust-SL0, a recently proposed algorithm for sparse approximate representation.

k

/

K

-NN criterion gives rise to a modified spectral manifold learning technique, namely Sparse-LLE, which demonstrates remarkable improvement over conventional LLE through our experiments.

Armin Eftekhari, Hamid Abrishami-Moghaddam, Massoud Babaie-Zadeh
A Parameter Free Approach for Clustering Analysis

In the paper, we propose a novel parameter free approach for clustering analysis. The approach needs not to make assumptions or define parameters on the cluster number or the results, while the clustered results are visually verified and approved by experimental work. For simplicity, this paper demonstrates the idea using Fuzzy C-Means (FCMs) clustering method, but the proposed open framework allows easy integration with other clustering methods. The method-independent framework generates optimal clustering results and avoids intrinsic biases from individual clustering methods.

Haiqiao Huang, Pik-yin Mok, Yi-lin Kwok, Sau-Chuen Au
Fitting Product of HMM to Human Motions

The Product of Hidden Markov Models (PoHMM) is a mixed graphical model defining a probability distribution on a sequence space from the normalized product of several simple Hidden Markov Models (HMMs). Here, we use this model to approach the human action recognition task incorporating mixture-Gaussian output distributions. PoHMM allow us to consider context at different range and to model different dynamics corresponding to different body parts in an efficient way. For estimating the normalization constant

Z

we introduce the annealed importance sampling (AIS) method in the context of PoHMM in order to obtain no-relative estimates of

Z

. We compare our approach with one based on fitting a logistic regression model to each two PoHMMs.

M. Ángeles Mendoza, Nicolás Pérez de la Blanca, Manuel J. Marín-Jiménez
Reworking Bridging for Use within the Image Domain

The task of automated classification is a highly active research field with great practical benefit over a number of problem domains. However, due to the factors such as lack of available training examples, large degrees of imbalance in the training set, or overlapping classes, the task of automated classification is rarely straightforward in practice. Methods that adequately compensate for such difficulties are required. The recently developed bridging algorithm does just this for problems in the field of short string text classification. The algorithm integrates a collection of background knowledge into the classification process. In this paper, we have shown how the bridging algorithm was redesigned so it can be applied to image data. We also demonstrated it is effective to overcome a range of difficulties in the classification process.

Henry Petersen, Josiah Poon
Detection of Ambiguous Patterns Using SVMs: Application to Handwritten Numeral Recognition

This work presents a pattern recognition system that is able to detect ambiguous patterns and explain its answers. The system consists of a set of parallel Support Vector Machine (SVM) classifiers, each one dedicated to a representative feature extracted from the input, followed by an analysing module based on a bayesian strategy in charge of defining the system answer. We apply the system to the recognition of handwritten numerals. Experiments were carried out on the MNIST database, which is generally accepted as one of the standards in most of the literature in the field.

Leticia Seijas, Enrique Segura

Shape Recovery

Accurate 3D Modelling by Fusion of Potentially Reliable Active Range and Passive Stereo Data

Possibilities of more accurate digital modelling of 3D scenes by fusing 3D range data from an active hand-held laser scene scanner developed in IRL and passive stereo data from stereo pairs of images of the scene collected during the scanning process are discussed. Complementary properties of two data sources allow for improving a 3D model by checking reliability of active range data and using it to adaptively guide passive stereo reconstruction. Experiments show that this avenue of the data fusion offers good prospects of error detection and correction.

Yuk Hin Chan, Patrice Delmas, Georgy Gimel’farb, Robert Valkenburg
Rapid Classification of Surface Reflectance from Image Velocities

We propose a method for rapidly classifying surface reflectance directly from the output of spatio-temporal filters applied to an image sequence of rotating objects. Using image data from only a single frame, we compute histograms of image velocities and classify these as being generated by a specular or a diffusely reflecting object. Exploiting characteristics of material-specific image velocities we show that our classification approach can predict the reflectance of novel 3D objects, as well as human perception.

Katja Doerschner, Dan Kersten, Paul Schrater
Structure-Preserving Regularisation Constraints for Shape-from-Shading

In this paper we present a new framework for shape-from-shading which relies on a novel regularisation term which preserves surface structure. The resulting algorithm is both robust and accurate. We show that it can recover stable surface estimates from both synthetic and real world images of complex objects, even under extreme illumination.

Rui Huang, William A. P. Smith
3D Object Reconstruction Using Full Pixel Matching

This paper proposes an approach to reconstruct 3D object from a sequence of 2D images using 2D Continuous Dynamic Programming algorithm (2DCDP) as full pixel matching technique. To avoid using both calibrated images and fundamental matrix in reconstructing 3D objects, the study uses the same approach with Factorization but aims to demonstrate the effectiveness in pixel matching of 2DCDP compared with other conventional methods such as Scale-Invariant Feature Transform (SIFT) or Kanade-Lucas-Tomasi tracker (KLT). The experiments in this study use relatively few uncalibrated images but still obtain accurate 3D objects, suggesting that our method is promising and superior to conventional methods.

Yuichi Yaguchi, Kenta Iseki, Nguyen Tien Viet, Ryuichi Oka
Rapid Inference of Object Rigidity and Reflectance Using Optic Flow

Rigidity and reflectance are key object properties, important in their own rights, and they are key properties that stratify motion reconstruction algorithms. However, the inference of rigidity and reflectance are both difficult without additional information about the object’s shape, the environment, or lighting. For humans, relative motions of object and observer provides rich information about object shape, rigidity, and reflectivity. We show that it is possible to detect rigid object motion for both specular and diffuse reflective surfaces using only optic flow, and that flow can distinguish specular and diffuse motion for rigid objects. Unlike nonrigid objects, optic flow fields for rigid moving surfaces are constrained by a global transformation, which can be detected using an optic flow matching procedure across time. In addition, using a Procrustes analysis of structure from motion reconstructed 3D points, we show how to classify specular from diffuse surfaces.

Di Zang, Katja Doerschner, Paul R. Schrater
On the Recovery of Depth from a Single Defocused Image

In this paper we address the challenging problem of recovering the depth of a scene from a single image using defocus cue. To achieve this, we first present a novel approach to estimate the amount of spatially varying defocus blur at edge locations. We re-blur the input image and show that the gradient magnitude ratio between the input and re-blurred images depends only on the amount of defocus blur. Thus, the blur amount can be obtained from the ratio. A layered depth map is then extracted by propagating the blur amount at edge locations to the entire image. Experimental results on synthetic and real images demonstrate the effectiveness of our method in providing a reliable estimate of the depth of a scene.

Shaojie Zhuo, Terence Sim

Segmentation

Modelling Human Segmentation Trough Color and Space Analysis

This paper proposes an algorithm of color segmentation that models the human-based perception according the Gestalt laws of similarity and proximity. We use the mean shift clustering to translate these laws into the analysis of the color layout of an image. Given a set of possible segmentations, the method uses a measure of stability to identify the most meaningful regions according to perceptual criteria. Quantitative results obtained on the Berkeley data set show that this approach outperforms state-of-the-art methods on human-based image segmentation.

Agnés Borràs, Josep Lladós
A Metric and Multiscale Color Segmentation Using the Color Monogenic Signal

In this paper, we use the formalism of Clifford algebras to extend the so-called Monogenic Signal to color images. This extension consists in a function with values in the Clifford algebra ℝ

5,0

that encodes color as well as geometric structure information. Using geometric calculus, such a mathematical object can be used to extend classical concepts of signal processing (filtering, Fourier Transform...) to color images in a consistent manner. Regarding this paper, a local color phase is introduced, which generalizes the one for grayscale image. As an example of application, we provide a new method for color segmentation. Based on our phase definition and the multiscale aspect of the Color Monogenic Signal, we provide a metric approach using differential geometry which reveals relevant on the Berkeley Image Dataset.

Guillaume Demarcq, Laurent Mascarilla, Pierre Courtellemont
An Interactive Level Set Approach to Semi-automatic Detection of Features in Food Micrographs

Microscopy is often employed in food research to inspect the microstructural features of food samples. Accurate detection of microscopic features is required for reliable quantitative analysis. We propose a user-assisted approach that can be easily integrated into a graphical interface. The proposed algorithm is based on a fast approximation of the common region-based level set equation, providing interactive computations. Experiments have been run on cheese micrographs acquired with electron and confocal microscopes.

Gaetano Impoco, Giuseppe Licitra
Shape Detection from Line Drawings by Hierarchical Matching

An object detection method from line drawing images is presented. In this method, the content of line drawing images are hierarchically represented, where a local neighborhood structure is formed for each primitive by grouping its nearest neighbors. The detection process is a hypothesis verification scheme. Firstly, the top

k

most similar local structures in the object drawing are obtained for each local structure of the model, and the corresponding transformation parameters are estimated. By treating each estimation result as a point in the parameter space, a dense region around the ground truth is then formed provided that there exists a model in the object drawing. At last, the mode detection method is used to find this dense region, and the significant modes are accepted as the occurrence of object instances.

Rujie Liu, Yuehong Wang, Takayuki Baba, Daiki Masumoto
A Fast Level Set-Like Algorithm with Topology Preserving Constraint

Implicit active contours are widely employed in image processing and related areas. Their implementation using the level set framework brings several advantages over parametric snakes. In particular, a parametrization independence, topological flexibility, and straightforward extension into higher dimensions have led to their popularity. However, in some applications the topological flexibility of the implicit contour is not desirable. Imposing topology-preserving constraints on evolving contours is often more convenient than including additional postprocessing steps. In this paper, we build on the work by Han et al. [1] introducing a topology-preserving extension of the narrow band algorithm involving simple point concept from digital geometry. In order to significantly increase computational speed, we integrate a fast level set-like algorithm by Nilsson and Heyden [2] with the simple point concept to obtain a fast topology-preserving algorithm for implicit active contours. The potential of the new algorithm is demonstrated on both synthetic and real image data.

Martin Maška, Pavel Matula
Significance Tests and Statistical Inequalities for Segmentation by Region Growing on Graph

Bottom-up segmentation methods merge similar neighboring regions according to a decision rule and a merging order. In this paper, we propose a contribution for each of these two points. Firstly, under statistical hypothesis of similarity, we provide an improved decision rule for region merging based on significance tests and the recent statistical inequality of McDiarmid. Secondly, we propose a dynamic merging order based on our merging predicate. This last heuristic is justified by considering an energy minimisation framework. Experimental results on both natural and medical images show the validity of our method.

Guillaume Née, Stéphanie Jehan-Besson, Luc Brun, Marinette Revenu
Scale Space Hierarchy of Segments

In this paper, we develop a segmentation algorithm using configurations of singular points in the linear scale space. We define segment edges as a zero-crossing set in the linear scale space using the singular points. An image in the linear scale space is the convolution of the image and the Gaussian kernel. The Gaussian kernel of an appropriate variance is a typical presmoothing operator for segmentation. The variance is heuristically selected using statistics of images such as the noise distribution in images. The variance of the kernel is determined using the singular point configuration in the linear scale space, since singular points in the linear scale space allow the extraction of the dominant parts of an image. This scale selection strategy derives the hierarchical structure of the segments.

Haruhiko Nishiguchi, Atsushi Imiya, Tomoya Sakai
Point Cloud Segmentation Based on Radial Reflection

This paper introduces a novel 3D segmentation algorithm, which works directly on point clouds to address the problem of partitioning a 3D object into useful sub-parts. In the last few decades, many different algorithms have been proposed in this growing field, but most of them are only working on complete meshes. However, in robotics, computer graphics, or other fields it is not always possible to work directly on a mesh. Experimental evaluations of a number of complex objects demonstrate the robustness and the efficiency of the proposed algorithm and the results prove that it compares well with a number of state-of-the-art 3D object segmentation algorithms.

Mario Richtsfeld, Markus Vincze
Locally Adaptive Speed Functions for Level Sets in Image Segmentation

We propose a framework for locally adaptive level set functions. The impact of well-known speed terms for the evolution of the active contour is adjusted by parameterising them with functions based on pre-defined properties. This allows for the application of level set methods even if image features are subject to large variations or if certain properties of the model are only valid for parts of the segmentation process. We present a number of examples and applications for the proposed concept and also address advantages and drawbacks of combinations of locally adaptive speed terms.

Karsten Rink, Klaus Tönnies
Improving User Control with Minimum Involvement in User-Guided Segmentation by Image Foresting Transform

The image foresting transform (IFT) can divide an image into object and background, each represented by one optimum-path forest rooted at internal and external markers selected by the user. We have considerably reduced the number of markers (user involvement) by separating object enhancement from its extraction. However, the user had no guidance about effective marker location during extraction, losing segmentation control. Now, we pre-segment the image automatically into a few regions. The regions inside the object are selected and merged from internal markers. Regions with object and background pixels are further divided by IFT. This provides more user control with minimum involvement, as validated on two public datasets.

T. V. Spina, Javier A. Montoya-Zegarra, P. A. V. Miranda, A. X. Falcão
3D Image Segmentation Using the Bounded Irregular Pyramid

This paper presents a novel pyramid approach for fast segmentation of 3D images. A pyramid is a hierarchy of successively reduced graphs whose efficiency is strongly influenced by the data structure that codes the information within the pyramid and the decimation process used to build a graph from the graph below. Depending on these two features, pyramids have been classified as regular and irregular ones. The proposed approach extends the idea of the Bounded Irregular Pyramid (BIP) [5] to 3D images. Thus, the 3D-BIP is a mixture of both types of pyramids whose goal is to combine their advantages: the low computational cost of regular pyramids with the consistent and useful results provided by the irregular ones. Specifically, its data structure combines a regular decimation process with an union-find strategy to build the successive 3D levels of the structure. Experimental results show that this approach is able to provide a low–level segmentation of 3D images at a low computational cost.

Fuensanta Torres, Rebeca Marfil, Antonio Bandera
The Gabor-Based Tensor Level Set Method for Multiregional Image Segmentation

This paper represents a new level set method for multiregional image segmentation. It employs the Gabor filter bank to extract local geometrical features and builds the pixel tensor representation whose dimensionality is reduced by using the offline tensor analysis. Then multiphase level set functions are evolved in the tensor field to detect the boundaries of the corresponding image. The proposed method has three main advantages as follows. Firstly, employing the Gabor filter bank, the model is more robust against the salt-and-pepper noise. Secondly, the pixel tensor representation comprehensively depicts the information of pixels, which results in a better performance on the non-homogenous image segmentation. Thirdly, the model provides a uniform equation for multiphase level set functions to make it more practical. We apply the proposed method to synthetic and medical images respectively, and the results indicate that the proposed method is superior to the typical region-based level set method.

Bin Wang, Xinbo Gao, Dacheng Tao, Xuelong Li, Jie Li
Embedded Geometric Active Contour with Shape Constraint for Mass Segmentation

Mass boundary segmentation plays an important role in computer aided diagnosis (CAD) system. Since the shape and boundary are crucial discriminant features in CAD, the active contour methods are more competitive in mass segmentation. However, the general active contour methods are not so effective for some cases, because most masses possess very blurry margin that easily induce the contour leaking. To the end, this paper presents an improved geometric active contour for mass segmentation. It firstly introduces the morphological concentric layer model for automatically initializing. Then an embedded level set is used to extract the adaptive shape constraints. For refining the boundary, a new shape constraint function and stopping function are designed for the enhanced geometric active contour method. The proposed method is tested on real mammograms containing masses, and the results suggest that the proposed method could effectively restrain the contour leaking and get better segmented results than general active contour methods.

Ying Wang, Xinbo Gao, Xuelong Li, Dacheng Tao, Bin Wang
An Efficient Parallel Algorithm for Graph-Based Image Segmentation

Automatically partitioning images into regions (‘segmentation’) is challenging in terms of quality and performance. We propose a Minimum Spanning Tree-based algorithm with a novel graph-cutting heuristic, the usefulness of which is demonstrated by promising results obtained on standard images. In contrast to data-parallel schemes that divide images into independently processed tiles, the algorithm is designed to allow parallelisation without truncating objects at tile boundaries. A fast parallel implementation for shared-memory machines is shown to significantly outperform existing algorithms. It utilises a new microarchitecture-aware single-pass sort algorithm that is likely to be of independent interest.

Jan Wassenberg, Wolfgang Middelmann, Peter Sanders

Stereo and Video Analysis

Coarse-to-Fine Tracking of Articulated Objects Using a Hierarchical Spring System

Tracking of articulated objects is a challenging task in Computer Vision. A highly target specific model can improve the robustness of the tracking by eliminating or reducing the ambiguities in the association task. This paper presents a flexible framework, which allows to build target specific, part-based models for arbitrary articulated objects. The rigid parts are described by hierarchical spring systems in form of attributed graph pyramids and connected via articulation points, which transfer position information between the adjacent parts.

Nicole Artner, Adrian Ion, Walter Kropatsch
Cooperative Stereo Matching with Color-Based Adaptive Local Support

Color processing imposes a new constraint on stereo vision algorithms: The assumption of constant color on object surfaces used to align local correlation windows with object boundaries has improved the accuracy of recent window based stereo algorithms significantly. While several algorithms have been presented that work with adaptive correlation windows defined by color similarity, only a few approaches use color based grouping to optimize initially computed traditional matching scores. This paper introduces the concept of color-dependent adaptive support weights to the definition of local support areas in cooperative stereo methods to improve the accuracy of depth estimation at object borders.

Roland Brockers
Iterative Camera Motion and Depth Estimation in a Video Sequence

This paper addresses the problem of the joint determination of camera motion parameters and scene depth information. A video sequence obtained by a one hand moving camera is the input. It is well known that the movement of a pixel between two consecutive images depends on the motion parameters and also on the depth of the projected point. Based on a camera motion estimation which uses the registration group and an energy minimization based on the Belief Propagation algorithm, we propose an iterative method combining camera motion and depth estimation.

Françoise Dibos, Claire Jonchery, Georges Koepfler
Performance Prediction for Unsupervised Video Indexing

Recently, performance prediction has been successfully applied in the field of information retrieval for content analysis and retrieval tasks. This paper discusses how performance prediction can be realized for unsupervised learning approaches in the context of video content analysis and indexing. Performance prediction helps in identifying the number of detection errors and can thus support post-processing. This is demonstrated for the example of temporal video segmentation by presenting an approach for automatically predicting the precision and recall of a video cut detection result. It is shown for the unsupervised cut detection approach that the related clustering validity measure is highly correlated with the precision of a detection result. Three regression methods are investigated to exploit the observed correlation. Experimental results demonstrate the feasibility of the proposed performance prediction approach.

Ralph Ewerth, Bernd Freisleben
New Lane Model and Distance Transform for Lane Detection and Tracking

Particle filtering of boundary points is a robust way to estimate lanes. This paper introduces a new lane model in correspondence to this particle filter-based approach, which is flexible to detect all kinds of lanes. A modified version of an Euclidean distance transform is applied to an edge map of a road image from a birds-eye view to provide information for boundary point detection. An efficient lane tracking method is also discussed. The use of this distance transform exploits useful information in lane detection situations, and greatly facilitates the initialization of the particle filter, as well as lane tracking. Finally, the paper validates the algorithm with experimental evidence for lane detection and tracking.

Ruyi Jiang, Reinhard Klette, Tobi Vaudrey, Shigang Wang
Real-Time Volumetric Reconstruction and Tracking of Hands in a Desktop Environment

A probabilistic framework for vision based volumetric reconstruction and marker free tracking of hand and face volumes is presented, which exclusively relies on off-the-shelf hardware components and can be applied in standard office environments. Here a 3D reconstruction of the interaction environment (user-space) is derived from multiple camera viewpoints which serve as input sources for mixture particle filtering to infer position estimates of hand and face volumes. The system implementation utilizes graphics hardware to comply with real-time constraints on a single desktop computer.

Christoph John, Ulrich Schwanecke, Holger Regenbrecht
Stereo Localization Using Dual PTZ Cameras

In this paper, we present a cooperative stereo system based on two pant-tilt-zoom (PTZ) cameras that can localize a moving target in a complex environment. Given an approximate target position that can be estimated by a fixed camera with a wide field of view, two PTZ cameras with a large baseline are pointed toward the target in order to estimate precisely its position. The overall method is divided in three parts: offline construction of a look-up-table (LUT) of rectification matrices, use of the LUT in real time for computing the rectification transformations for arbitrary camera positions, and finally 3D target localization. A chain of homographic transformations are used for finding the matching between different pairs of wide baseline stereo images. The proposed stereo localization system has two advantages: improved localization on a partially occluded target and monitoring a large environment using only two PTZ cameras without missing significant information. Finally, through experimental results, we show that the proposed system is able to make required localization of targets with good accuracy.

Sanjeev Kumar, Christian Micheloni, Claudio Piciarelli
Object Tracking in Video Sequences by Unsupervised Learning

A Growing Competitive Neural Network system is presented as a precise method to track moving objects for video-surveillance. The number of neurons in this neural model can be automatically increased or decreased in order to get a one-to-one association between objects currently in the scene and neurons. This association is kept in each frame, what constitutes the foundations of this tracking system. Experiments show that our method is capable to accurately track objects in real-world video sequences.

R. M. Luque, J. M. Ortiz-de-Lazcano-Lobato, Ezequiel Lopez-Rubio, E. J. Palomo
A Third Eye for Performance Evaluation in Stereo Sequence Analysis

Prediction errors are commonly used when analyzing the performance of a multi-camera stereo system using at least three cameras. This paper discusses this methodology for performance evaluation for the first time on long stereo sequences (in the context of vision-based driver assistance systems). Three cameras are calibrated in an ego-vehicle, and prediction error analysis is performed on recorded stereo sequences. They are evaluated using various common stereo matching algorithms, such as belief propagation, dynamic programming, semi-global matching, or graph cut. Performance is evaluated on both synthetic and real data.

Sandino Morales, Reinhard Klette
OIF - An Online Inferential Framework for Multi-object Tracking with Kalman Filter

We propose an Online Inferential Framework (OIF) for tracking humans and objects under occlusions with Kalman tracker. The OIF is constructed on knowledge representation schemes, precisely semantic logic where each node represents the detected moving object and flow paths represent the association among the moving objects. A maximum likelihood is computed using our CWHI-based technique and Bhattacharyya coefficient. The proposed framework efficiently interprets multiple possibilities of tracking by manipulating the ”propositional logic” on the basis of maximum likelihood at a time window. The logical propositions are built by formularizing facts, semantic rules and integrity constraints associated with tracking. The experimental results show that our novel OIF is able to track objects along with the interpretation of their physical states accurately and reliably under complete occlusion, illustrating its contribution and advantages over various other approaches.

Saira Saleem Pathan, Ayoub Al-Hamadi, Bernd Michaelis
Real-Time Stereo Vision: Making More Out of Dynamic Programming

Dynamic Programming (DP) is a popular and efficient method for calculating disparity maps from stereo images. It allows for meeting real-time constraints even on low-cost hardware. Therefore, it is frequently used in real-world applications, although more accurate algorithms exist. We present a refined DP stereo processing algorithm which is based on a standard implementation. However it is more flexible and shows increased performance. In particular, we introduce the idea of multi-path backtracking to exploit the information gained from DP more effectively. We show how to automatically tune all parameters of our approach offline by an evolutionary algorithm. The performance was assessed on benchmark data. The number of incorrect disparities was reduced by 40 % compared to the DP reference implementation while the overall complexity increased only slightly.

Jan Salmen, Marc Schlipsing, Johann Edelbrunner, Stefan Hegemann, Stefan Lüke
Optic Flow Using Multi-scale Anchor Points

We introduce a new method to determine the flow field of an image sequence using multi-scale anchor points. These anchor points manifest themselves in the scale-space representation of an image. The novelty of our method lies largely in the fact that the relation between the scale-space anchor points and the flow field is formulated in terms of soft constraints in a variational method. This leads to an algorithm for the computation of the flow field that differs fundamentally from previously proposed ones based on hard constraints. We show a significant performance increase when our method is applied to the Yosemite image sequence, a standard and well-established benchmark sequence in optic flow research.

Pieter van Dorst, Bart Janssen, Luc Florack, Bart M. ter Haar Romeny
A Methodology for Evaluating Illumination Artifact Removal for Corresponding Images

Robust stereo and optical flow disparity matching is essential for computer vision applications with varying illumination conditions. Most robust disparity matching algorithms rely on computationally expensive normalized variants of the brightness constancy assumption to compute the matching criterion. In this paper, we reinvestigate the removal of global and large area illumination artifacts, such as vignetting, camera gain, and shading reflections, by directly modifying the input images. We show that this significantly reduces violations of the brightness constancy assumption, while maintaining the information content in the images. In particular, we define metrics and perform a methodical evaluation to identify the loss of information in the images. Next we determine the reduction of brightness constancy violations. Finally, we experimentally validate that modifying the input images yields robustness against illumination artifacts for optical flow disparity matching.

Tobi Vaudrey, Andreas Wedel, Reinhard Klette
Nonlinear Motion Detection

This work presents new ideas in multidimensional signal theory: an isotropic quadrature filter approach for extracting local features of arbitrary curved signals without the use of any steering techniques. We unify scale space, local amplitude, orientation, phase and curvature in one framework. The main idea is to lift up signals by a conformal mapping to the higher dimensional conformal space where the local signal features can be analyzed with more degrees of freedom compared to the flat space of the original signal domain. The philosophy is based on the idea to make use of the relation of the conformal signal to geometric entities such as hyper-planes and hyper-spheres. Furthermore, the conformal signal can not only be applied to 2D and 3D signals but also to signals of any dimension. The main advantages in practical applications are the rotational invariance, the low computational time complexity, the easy implementation into existing Computer Vision software packages, and the numerical robustness of calculating exact local curvature of signals without the need of any derivatives. Applications can be optical flow and object tracking not only limited to constant velocities but detecting also arbitrary accelerations which correspond to the local curvature.

Lennart Wietzke, Gerald Sommer

Texture Analysis

Rotation Invariant Texture Classification Using Binary Filter Response Pattern (BFRP)

Using statistical textons for texture classification has shown great success recently. The maximal response 8 (MR8) method, which extracts an 8-dimensional feature set from 38 filters, is one of state-of-the-art rotation invariant texture classification methods. However, this method has two limitations. First, it require a training stage to build a texton library, thus the accuracy depends on the training samples; second, during classification, each 8-dimensional feature is assigned to a texton by searching for the nearest texton in the library, which is time consuming especially when the library size is big. In this paper, we propose a novel texton feature, namely Binary Filter Response Pattern (BFRP). It can well address the above two issues by encoding the filter response directly into binary representation. The experimental results on the CUReT database show that the proposed BFRP method achieves better classification result than MR8, especially when the training dataset is limited and less comprehensive.

Zhenhua Guo, Lei Zhang, David Zhang
Near-Regular Texture Synthesis

This paper describes a method for seamless enlargement or editing of difficult colour textures containing simultaneously both regular periodic and stochastic components. Such textures cannot be successfully modelled using neither simple tiling nor using purely stochastic models. However these textures are often required for realistic appearance visualisation of many man-made environments and for some natural scenes as well. The principle of our near-regular texture synthesis and editing method is to automatically recognise and separate periodic and random components of the corresponding texture. Each of these components is subsequently modelled using its optimal method. The regular texture part is modelled using our roller method, while the random part is synthesised from its estimated exceptionally efficient Markov random field based representation. Both independently enlarged texture components from the original measured texture are combined in the resulting synthetic near-regular texture. In the editing application both enlarged texture components can be from two different textures. The presented texture synthesis method allows large texture compression and it is simultaneously extremely fast due to complete separation of the analytical step of the algorithm from the texture synthesis part. The method is universal and easily viable in a graphical hardware for purpose of real-time rendering of any type of near-regular static textures.

Michal Haindl, Martin Hatka
Texture Editing Using Frequency Swap Strategy

A fully automatic colour texture editing method is proposed, which allows to synthesise and enlarge an artificial texture sharing anticipated properties from its parent textures. The edited colour texture maintains its original colour spectrum while its frequency is modified according to one or more target template textures. Edited texture is synthesised using a fast recursive model-based algorithm. The algorithm starts with edited and target colour texture samples decomposed into a multi-resolution grid using the Gaussian-Laplacian pyramid. Each band pass colour factors are independently modelled by their dedicated 3D causal autoregressive random field models (CAR). We estimate an optimal contextual neighbourhood and parameters for each of the CAR submodel. The synthesised multi-resolution Laplacian pyramid of the edited colour texture is replaced by the synthesised template texture Laplacian pyramid. Finally the modified texture pyramid is collapsed into the required fine resolution colour texture. The primary benefit of these multigrid texture editing models is their ability to produce realistic novel textures with required visual properties capable of enhancing realism in various texture application areas.

Michal Haindl, Vojtěch Havlíček
A Quantitative Evaluation of Texture Feature Robustness and Interpolation Behaviour

Whenever an image database has to be organised according to higher level human perceptual properties, a transformation model is needed to bridge the semantic gap between features and the perceptual space. To guide the feature selection process for a transformation model, we investigate the behaviour of 5 texture feature categories.

Using a novel mixed synthesis algorithm we generate textures with a gradual transition between two existing ones, to investigate the feature interpolation behaviour. In addition the features’ robustness to minor textural changes is evaluated in a kNN query-by-example experiment.

We compare robustness and interpolation behaviour, showing that Gabor energy map features are outperforming gray level co-occurrence matrix features in terms of linear interpolation quality.

Stefan Thumfart, Wolfgang Heidl, Josef Scharinger, Christian Eitzinger

Applications

Nonlinear Dimension Reduction and Visualization of Labeled Data

The amount of electronic information as well as the size and dimensionality of data sets have increased tremendously. Consequently, dimension reduction and visualization techniques have become increasingly popular in recent years. Dimension reduction is typically connected with loss of information. In supervised classification problems, class labels can be used to minimize the loss of information concerning the specific task. The aim is to preserve and potentially enhance the discrimination of classes in lower dimensions. Here we propose a prototype-based local relevance learning scheme, that results in an efficient nonlinear discriminative dimension reduction of labeled data sets. The method is introduced and discussed in terms of artificial and real world data sets.

Kerstin Bunte, Barbara Hammer, Michael Biehl
Performance Evaluation of Airport Lighting Using Mobile Camera Techniques

This paper describes the use of mobile camera technology to assess the performance of Aerodrome Ground Lighting (AGL). Cameras are placed inside the cockpit of an aircraft and used to record images of the AGL during an approach to an airport. Subsequent image analysis, using the techniques proposed in this paper, will allow a performance metric to be determined for the lighting. This can be used to inform regulators if the AGL is performing to standards and it will also provide useful information towards the maintenance strategy for the airport. Since the cameras that are used to collect the images are mounted on a moving and vibrating platform (the plane), some image data may be effected by vibration. In the paper we illustrate techniques by which to quantify and remove the effects of vibration and illustrate how the image data can be used to derive a performance metric for the complete AGL.

Shyama Prosad Chowdhury, Karen McMenemy, Jian-Xun Peng
Intelligent Video Surveillance for Detecting Snow and Ice Coverage on Electrical Insulators of Power Transmission Lines

One of the problems for electrical power delivery through power lines in northern countries is when snow or ice accumulates on electrical insulators. This could lead to snow or ice-induced outages and voltage collapse, causing huge economic loss. This paper proposes a novel real-time intelligent surveillance and image analysis system for detecting and estimating the snow and ice coverage on electric insulators using images captured from an outdoor 420 kV power transmission line. In addition, the swing angle of insulators is estimated, as large swing angles due to wind cause short circuits. Hybrid techniques by combining histogram, edges, boundaries and cross-correlations are employed for handling a broad range of scenarios caused by changing weather and lighting conditions. Experiments have been conducted on the captured images over several month periods. Results have shown that the proposed system has provided valuable estimation results. For image pixels related to snows on the insulator, the current system has yielded an average detection rate of 93% for good quality images, and 67.6% for images containing large amount of poor quality ones, and the corresponding average false alarm ranges from 9% to 18.1%. Further improvement may be achieved by using video-based analysis and improved camera settings.

Irene Y. H. Gu, Unai Sistiaga, Sonja M. Berlijn, Anders Fahlström
Size from Specular Highlights for Analyzing Droplet Size Distributions

In mechanical engineering, heat-transfer models by dropwise condensation are under development. The condensation process is captured by taking many pictures, which show the formation of droplets, of which the size distribution and area coverage are of interest for model improvement. The current analysis method relies on manual measurements, which is time consuming. In this paper, we propose an approach to automatically extract the positions and radii of the droplets from an image. Our method relies on specular highlights that are visible on the surfaces of the droplets. We show that these highlights can be reliably extracted, and that they provide sufficient information to infer the droplet size. The results obtained by our method compare favorably with those obtained by laborious and careful manual measurements. The processing time per image is reduced by two orders of magnitude.

Andrei C. Jalba, Michel A. Westenberg, Mart H. M. Grooten
Capturing Physiology of Emotion along Facial Muscles: A Method of Distinguishing Feigned from Involuntary Expressions

The ability to distinguish feigned from involuntary expressions of emotions could help in the investigation and treatment of neuropsychiatric and affective disorders and in the detection of malingering. This work investigates differences in emotion-specific patterns of thermal variations along the major facial muscles. Using experimental data extracted from 156 images, we attempted to classify patterns of emotion-specific thermal variations into neutral, and voluntary and involuntary expressions of positive and negative emotive states. Initial results suggest (i) each facial muscle exhibits a unique thermal response to various emotive states; (ii) the pattern of thermal variances along the facial muscles may assist in classifying voluntary and involuntary facial expressions; and (iii) facial skin temperature measurements along the major facial muscles may be used in automated emotion assessment.

Masood Mehmood Khan, Robert D. Ward, Michael Ingleby
Atmospheric Visibility Monitoring Using Digital Image Analysis Techniques

Atmospheric visibility is a standard of human visual perception of the environment. It is also directly associated with air quality, polluted species and climate. The influence of urban atmospheric visibility affects not only human health but also traffic safety and human life quality. Visibility is traditionally defined as the maximum distance at which a selected target can be recognized. To replace the traditional measurement for atmospheric visibility, digital image processing schemes provide good visibility data, established by numerical index. The performance of these techniques is defined by the correlation between the observed visual range and the obtained index. Since performance is affected by non-uniform illumination, this paper proposes a new procedure to estimate the visibility index with a sharpening method. The experimental results show that the proposed procedure obtains a better correlation coefficient than previous schemes.

Jiun-Jian Liaw, Ssu-Bin Lian, Yung-Fa Huang, Rung-Ching Chen
Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness

Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.

Kang Liu, Joern Ostermann
Analysis of Speed Sign Classification Algorithms Using Shape Based Segmentation of Binary Images

Traffic Sign Recognition is a widely studied problem and its dynamic nature calls for the application of a broad range of preprocessing, segmentation, and recognition techniques but few databases are available for evaluation. We have produced a database consisting of 1,300 images captured by a video camera. On this database we have conducted a systematic experimental study. We used four different preprocessing techniques and designed a generic speed sign segmentation algorithm. Then we selected a range of contemporary speed sign classification algorithms using shape based segmented binary images for training and evaluated their results using four metrics, including accuracy and processing speed. The results indicate that Naive Bayes and Random Forest seem particularly well suited for this recognition task. Moreover, we show that two specific preprocessing techniques appear to provide a better basis for concept learning than the others.

Azam Sheikh Muhammad, Niklas Lavesson, Paul Davidsson, Mikael Nilsson
Using CCD Moiré Pattern Analysis to Implement Pressure-Sensitive Touch Surfaces

The Moiré fringe patterns obtained when a CCD camera views a repetitive line grating can be exploited to measure small changes to surface displacements. We describe how curved surfaces with line grating patterns can be reconstructed by analysing the instantaneous frequency of the extracted 1D Moiré waveform. Experimental results show that monotonically increasing displacements of a stretched canvas of less than 1mm can be clearly separated, suggesting the possibility of using the proposed Moiré-based vision technique to construct accurate pressure-sensitive touch surfaces.

Tong Tu, Wooi Boon Goh
Enhanced Landmine Detection from Low Resolution IR Image Sequences

We deal with the problem of landmine field detection using low-resolution infrared (IR) image sequences measured from airborne or vehicle-borne passive IR cameras. The proposed scheme contains two parts: a) employ a multi-scale detector, i.e., a special type of isotropic bandpass filters, to detect landmine candidates in each frame; b) enhance landmine detection through seeking maximum consensus of corresponding landmine candidates over image frames. Experiments were conducted on several IR image sequences measured from airborne and vehicle-borne cameras, where some results are included. As shown in our experiments, the landmine signatures have been significantly enhanced using the proposed scheme, and automatic detection results are reasonably good. These methods can therefore be applied to assisting humanitarian demining work for landmine field detection.

Tiesheng Wang, Irene Yu-Hua Gu, Tardi Tjahjadi

Erratum

Erratum to: Cooperative Stereo Matching with Color-Based Adaptive Local Support
Roland Brockers
Erratum to: Human Age Estimation by Metric Learning for Regression Problems

The paper entitled ” Human Age Estimation by Metric Learning for Regression Problems” in this volume has been retracted due to identity theft and multiple submission. Yangjing Long is not the author and has had absolutely nothing to do with the contribution whatsoever.

Yangjing Long
Backmatter
Metadaten
Titel
Computer Analysis of Images and Patterns
herausgegeben von
Xiaoyi Jiang
Nicolai Petkov
Copyright-Jahr
2009
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-03767-2
Print ISBN
978-3-642-03766-5
DOI
https://doi.org/10.1007/978-3-642-03767-2

Premium Partner