Skip to main content

Über dieses Buch

IbPRIA 2005 (Iberian Conference on Pattern Recognition and Image Analysis) was the second of a series of conferences jointly organized every two years by the Portuguese and Spanish Associations for Pattern Recognition (APRP, AERFAI), with the support of the International Association for Pattern Recognition (IAPR). This year, IbPRIA was hosted by the Institute for Systems and Robotics and the Geo-systems Center of the Instituto Superior Tecn ´ ico and it was held in Estoril, Por- gal. It provided the opportunity to bring together researchers from all over the world to discuss some of the most recent advances in pattern recognition and all areas of video, image and signal processing. There was a very positive response to the Call for Papers for IbPRIA 2005. We - ceived 292 full papers from 38 countries and 170 were accepted for presentation at the conference. The high quality of the scienti?c program of IbPRIA 2005 was due ?rst to the authors who submitted excellent contributions and second to the dedicated colla- ration of the international Program Committee and the other researchers who reviewed the papers. Each paper was reviewed by two reviewers, in a blind process. We would like to thank all the authors for submitting their contributions and for sharing their - search activities. We are particularly indebted to the Program Committee members and to all the reviewers for their precious evaluations, which permitted us to set up this publication.



Computer Vision


An Invariant and Compact Representation for Unrestricted Pose Estimation

This paper describes a novel compact representation of local features called the tensor doublet. The representation generates a four dimensional feature vector which is significantly less complex than other approaches, such as Lowe’s 128 dimensional feature vector. Despite its low dimensionality, we demonstrate here that the tensor doublet can be used for pose estimation, where the system is trained for an object and evaluated on images with cluttered background and occlusion.

Robert Söderberg, Klas Nordberg, Gösta Granlund

Gabor Parameter Selection for Local Feature Detection

Some recent works have addressed the object recognition problem by representing objects as the composition of independent image parts, where each part is modeled with “low-level” features. One of the problems to address is the choice of the low-level features to appropriately describe the individual image parts. Several feature types have been proposed, like edges, corners, ridges, Gaussian derivatives, Gabor features, etc. Often features are selected independently of the object to represent and have fixed parameters. In this work we use Gabor features and describe a method to select feature parameters suited to the particular object considered. We propose a method based on the Information Diagram concept, where “good” parameters are the ones that optimize the filter’s response in the filter parameter space. We propose and compare some concrete methodologies to choose the Gabor feature parameters, and illustrate the performance of the method in the detection of facial parts like eyes, noses and mouths. We show also the rotation invariance and robustness to small scale changes of the proposed Gabor feature.

Plinio Moreno, Alexandre Bernardino, José Santos-Victor

Real-Time Tracking Using Multiple Target Models

Using Comaniciu et al.’s approach as a basis, [9], this paper presents a real-time tracking technique in which a multiple target model is used. The use of a multiple model shall enable us to provide the tracking scheme with a greater robustness for tracking tasks on sequences in which there are changes in the lighting of the tracked object. In order to do so, a selection function is defined for the model to be used in the search process of the object in each frame.

Manuel J. Lucena, José M. Fuertes, Nicolás Pérez de la Blanca

Efficient Object-Class Recognition by Boosting Contextual Information

Object-class recognition is one of the most challenging fields of pattern recognition and computer vision. Currently, most authors represent an object as a collection of parts and their mutual spatial relations. Therefore, two types of information are extracted: local information describing each part, and contextual information describing the (spatial) context of the part, i.e. the spatial relations between the rest of the parts and the current one. We define a generalized correlogram descriptor and represent the object as a constellation of such generalized correlograms. Using this representation, both local and contextual information are gathered into the same feature space. We take advantage of this representation in the learning stage, by using a feature selection with boosting that learns both types of information simultaneously and very efficiently. Simultaneously learning both types of information proves to be a faster approach than dealing with them separately. Our method is compared with state-of-the-art object-class recognition systems by evaluating both the accuracy and the cost of the methods.

Jaume Amores, Nicu Sebe, Petia Radeva

Illumination Intensity, Object Geometry and Highlights Invariance in Multispectral Imaging

It is well-known that image pixel values of an object could vary if the lighting conditions change. Some common factors that produce changes in the pixels values are due to the viewing and the illumination direction, the surface orientation and the type of surface.

For the last years, different works have addressed that problem, proposing invariant representations to the previous factors for colour images, mainly to shadows and highlights. However, there is a lack of studies about invariant representations for multispectral images, mainly in the case of invariants to highlights.

In this paper, a new invariant representation to illumination intensity, object geometry and highlights for multispectral images is presented. The dichromatic reflection model is used as physical model of the colour formation process. Experiments with real images are also presented to show the performance of our approach.

Raúl Montoliu, Filiberto Pla, Arnoud C. Klaren

Local Single-Patch Features for Pose Estimation Using the Log-Polar Transform

This paper presents a local image feature, based on the log-polar transform which renders it invariant to orientation and scale variations. It is shown that this feature can be used for pose estimation of 3D objects with unknown pose, with cluttered background and with occlusion. The proposed method is compared to a previously published one and the new feature is found to be about as good or better as the old one for this task.

Fredrik Viksten, Anders Moe

Dealing with Multiple Motions in Optical Flow Estimation

In this paper, a new approach to optical flow estimation in presence of multiple motions is presented. Firstly, motions are segmented on the basis of a frequency-based approach that groups spatio-temporal filter responses with continuity in its motion (each group will define a

motion pattern

). Then, the gradient constraint is applied to the output of each filter so that multiple estimations of the velocity at the same location may be obtained. For each “motion pattern”, the velocities at a given point are then combined using a probabilistic approach. The use of “motion patterns” allows multiple velocities to be represented, while the combination of estimations from different filters helps reduce the aperture problem.

Jesús Chamorro-Martínez, Javier Martínez-Baena, Elena Galán-Perales, Beén Prados-Suárez

Conversion into Three-Dimensional Implicit Surface Representation from Topological Active Volumes Based Segmentation

In the last few years, the advances in three-dimensional medical image processing have made possible operations like planning or simulation over real data. Different representations of structures or models have been proposed, being the implicit surfaces one of the most flexible models for processing. This paper introduces a new method for computing the implicit surfaces from the explicit representations of the objects segmented in three-dimensional images. This proposal is based on the approximation of the surfaces using distance functions and

natural neighbor interpolation

. The system has been tested over


images of tibia and femur where the explicit representation has been extracted through a


model [1]. The results obtained show the suitability of the method for a correct representation of the target objects.

José Rouco, Noelia Barreira, Manuel G. Penedo, Xosé M. Pardo

Automatic Matching and Motion Estimation from Two Views of a Multiplane Scene

This paper addresses the computation of motion between two views when 3D structure is unknown but planar surfaces can be assumed. We use points which are automatically matched in two steps. The first one is based on image parameters and the second one is based on the geometric constraint introduced by computed homographies. When two or more planes are observed, corresponding homographies can be computed and they can be used to obtain the fundamental matrix, which gives constraints for the whole scene. The computation of the camera motion can be carried out from a homography or from the fundamental matrix. Experimental results prove this approach to be robust and functional for real applications in man made environments.

Gonzalo López-Nicolás, Carlos Sagüés, José J. Guerrero

Contextual Soccer Detection Using Mosaicing Techniques

Sport Video understanding aims to select and summarize important video events that occur in only special fragments of the whole sports video. A key aspect to this objective is to determine the position in the match field where the action takes place, that is, the location context of the play. In this paper we present a method to localize where in the match field the play is taking place. We apply our method to soccer videos, although the method is extensive to other sports. The method is based on constructing the mosaic of the first sequence that we process: this new mosaic is used as a

context mosaic

. Using this mosaic we register the frames of the other sequences in order to put in correspondence all the frames with the context mosaic, that is, put in context any play. In order to construct the mosaics, we have developed a novel method to register the soccer sequences based on tracking imaginary straight lines using the Lucas-Kanade feature tracker and the


robust estimator.

Lluis Barceló, Xavier Binefa

Probabilistic Image-Based Tracking: Improving Particle Filtering


is a widely-used tracking algorithm based on particle filters. Although some results have been achieved, it has several unpleasant behaviours. In this paper, we highlight these misbehaviours and propose two improvements. A new weight assignment, which avoids sample impoverishment, is presented. Subsequently, the prediction process is enhanced. The proposal has been successfully tested using synthetic data, which reproduces some of the main difficulties a tracker must deal with.

Daniel Rowe, Ignasi Rius, Jordi Gonzàlez, Xavier Roca, Juan J. Villanueva

A Framework to Integrate Particle Filters for Robust Tracking in Non-stationary Environments

In this paper we propose a new framework to integrate several particle filters, in order to obtain a robust tracking system able to cope with abrupt changes of illumination and position of the target. The proposed method is analytically justified and allows to build a tracking procedure that adapts online and simultaneously the colorspace where the image points are represented, the color distributions of the object and background and the contour of the object.

Francesc Moreno-Noguer, Alberto Sanfeliu

Stereo Reconstruction of a Submerged Scene

This article presents work dedicated to the study of refraction effects between two media in stereo reconstruction of a tridimensional scene. This refraction induces nonlinear effects making the stereo processing highly complex. We propose a linear approximation which maps this problem into a new problem with a conventional solution. We present results taken both from synthetic images generated by a raytracer and results from real life scenes.

Ricardo Ferreira, João P. Costeira, João A. Santos

A Functional Simplification of the BCS/FCS Image Segmentation

In this paper, a functional simplification of the BCS/FCS neurobiological model for image segmentation is presented. The inherent complexity of the BCS/FCS system is mainly due to the close modelling of the cortical mechanisms and to the high number of parameters involved. For functional applications, the proposed simplification retains both the biological concepts of the BCS/FCS and its performance, while greatly reducing the number of parameters and the execution time.

Pablo Martínez, Miguel Pinzolas, Juan López Coronado, Daniel García

From Moving Edges to Moving Regions

In this paper, we propose a new method to extract moving objects from a video stream without any motion estimation. The objective is to obtain a method robust to noise, large motions and ghost phenomena. Our approach consists in a frame differencing strategy combined with a hierarchical segmentation approach. First, we propose to extract moving edges with a new robust difference scheme, based on the spatial gradient. In the second stage, the moving regions are extracted from previously detected moving edges by using a hierarchical segmentation. The obtained moving objects description is represented as an adjacency graph. The method is validated on real sequences in the context of video-surveillance, assuming a static camera hypothesis.

Loic Biancardini, Eva Dokladalova, Serge Beucher, Laurent Letellier

Polygon Optimisation for the Modelling of Planar Range Data

In this paper we present efficient and fast algorithms for the reconstruction of scenes or objects using range image data. Assuming that a good segmentation is available, we concentrate on the polygonisation, triangulation and optimisation, i.e. both triangle reduction and adaptive edge filtering to improve edge linearity. In the processing, special attention is given to complex edge junctions. In a last step, vertex neighbourhoods are analysed in order to robustly attribute depth to the triangle list from the noisy range data.

Samuel Nunes, Daniel Almeida, Eddy Loke, Hans du Buf

Stereo Vision System with the Grouping Process of Multiple Reaction-Diffusion Models

The present paper proposes a system that detects a stereo disparity map from random-dot stereograms with the grouping process. A simple operation for random-dot stereograms converts the stereo correspondence problem to the segmentation one. For solving the segmentation problem derived from random-dot stereograms, the stereo vision system proposed here utilizes the grouping process of our previously proposed model. The model for the grouping process consists of multiple reaction-diffusion models, each of which governs segments having a disparity in the stereo vision system. A self-inhibition mechanism due to strong inhibitory diffusion within a particular reaction-diffusion model and a mutual-inhibition mechanism among the models are built in the proposed system. Experimental results for artificially generated random-dot stereograms show the validity of the proposed system.

Atsushi Nomura, Makoto Ichikawa, Hidetoshi Miike

Registration of Moving Surfaces by Means of One-Shot Laser Projection

The acquisition of three-dimensional models of a given surface is a very interesting subject in computer vision. Most of techniques are based on the use of laser range finders coupled to a mechanical system that scans the surface. These techniques lacks of accuracy in the presence of vibrations or non-controlled surface motion because of the misalignments between the acquired images. In this paper, we propose a new one-shot pattern which benefits from the use of registration techniques to recover a whole surface in the presence of non-controlled motion.

Carles Matabosch, David Fofi, Joaquim Salvi, Josep Forest

A Computer Vision Sensor for Panoramic Depth Perception

A practical way for obtaining depth in computer vision is the use of structured light systems. For panoramic depth reconstruction several images are needed which most likely implies the construction of a sensor with mobile elements. Moreover, misalignments can appear for non-static scenes. Omnidirectional cameras offer a much wider field of view than the perspective ones, capture a panoramic image at every moment and alleviate the problems due to occlusions. This paper is focused on the idea of combining omnidirectional vision and structured light with the aim to obtain panoramic depth information. The resulting sensor is formed by a single catadioptric camera and an omnidirectional light projector.

Radu Orghidan, El Mustapha Mouaddib, Joaquim Salvi

Probabilistic Object Tracking Based on Machine Learning and Importance Sampling

The paper presents a novel particle filtering framework for visual object tracking. One of the contributions is the development of a likelihood function based on one of machine learning algorithm–AdaBoost algorithm. The likelihood function can capture the structure characteristics of one class of objects, and is thus robust to clutters and noise in the complex background. The other contribution is the adoption of mean shift iteration as a proposal distribution, which can steer discrete samples towards regions which most likely contain the targets, and is therefore leading to computational efficiency in the algorithm. The effectiveness of such a framework is demonstrated with a particular class of objects–human faces.

Peihua Li, Haijing Wang

A Calibration Algorithm for POX-Slits Camera

Recent developments have suggested alternative multiperspective camera models potentially advantageous for the analysis of the scene structure. Two-slit cameras are one such case. These cameras collect all rays passing through two lines. The projection model for these cameras is non-linear, and in this model every 3D point is projected by a line that passes through that point and intersects two slits. In this paper we propose a robust non-iterative linear method for the calibration of this type of cameras. For that purpose a calibrating object with known dimensions is required. A solution for the calibration can be obtained using at least thirteen world to image correspondences. To achieve a higher level of accuracy data normalization and a non-linear technique based on the maximum likelihood criterion can be used to refine the estimated solution.

Nuno Martins, Hélder Araújo

Vision-Based Interface for Integrated Home Entertainment System

Home entertainment systems are trending to be integrated to a single system and to be more complex and difficult to control. Due to it, the methods developed for specific entertainment system are difficult to be applied to integrated systems. Accordingly, this paper presents a vision-based interface for integrated home entertainment system. The proposed interface has two types of modes: mouse control mode and instruction mode. The first mode move mouse point and click the icons using hand motion and shape and the second make instruction by hand gestures. The proposed interface is able to make predefined several gestures mapped to several similar tasks from different entertainment systems, which reduces the number of gestures and makes the interface more intuitive.

Jae Sik Chang, Sang Ho Kim, Hang Joon Kim

A Proposal for a Homeostasis Based Adaptive Vision System

In this work an approach to an adaptive vision system is presented. It is based on a homeostatic approach where the system state is represented as a set of artificial hormones which are affected by the environmental changes. To compensate these changes, the vision system is endowed with


which are in charge of modifying the system parameters in order to keep the system performance as high as possible. To coordinate the drives in the system, a supervisor level based on fuzzy logic has been added. Experiments in both controlled and uncontrolled environments have been carried out to validate the proposal.

Javier Lorenzo-Navarro, Daniel Hernández, Cayetano Guerra, José Isern-González

Relaxed Grey-World: Computational Colour Constancy by Surface Matching

In this paper we present a new approach to computational colour constancy problem based on the process of surface matching. Classical colour constancy methods do not usually rely on this important source of information and they often use only partial information in the images. Our proposal is to introduce the use of a set of canonical surfaces and its matching versus the content of the image using a ‘relaxed’ grey-world assumption to perform colour constancy. Therefore, our approach takes into account information not considered in previous methods, which normally rely on statistical information in the image like highest luminance or image gamuts. Nevertheless the selection of the canonical surfaces is not a trivial process and should be studied deeply.

Francesc Tous, María Vanrell, Ramón Baldrich

A Real-Time Driver Visual Attention Monitoring System

This paper describes a framework for analyzing video sequences of a driver and determining his level of attention. The proposed system deals with the computation of eyelid movement parameters and head (face) orientation estimation. The system relies on pupil detection to robustly track the driver’s head pose and monitoring its level of fatigue. Visual information is acquired using a specially designed solution combining a CCD video camera with an NIR illumination system. The system is fully automatic and classifies rotation in all-view direction, detects eye blinking and eye closure and recovers the gaze of the eyes. Experimental results using real images demonstrates the accuracy and robustness of the proposed solution.

Jorge P. Batista

An Approach to Vision-Based Person Detection in Robotic Applications

We present an approach to vision-based person detection in robotic applications that integrates top down template matching with bottom up classifiers. We detect components of the human silhouette, such as torso and legs; this approach provides greater invariance than monolithic methods to the wide variety of poses a person can be in. We detect borders on each image, then apply a distance transform, and then match templates at different scales. This matching process generates a focus of attention (candidate people) that are later confirmed using a trained Support Vector Machine (SVM) classifier. Our results show that this method is both fast and precise and directly applicable in robotic architectures.

Carlos Castillo, Carolina Chang

A New Approach to the Template Update Problem

Visual tracking based on pattern matching is a very used computer vision technique in a wide range of applications [4]. Updating the template of reference is a crucial aspect for a correct working of this kind of algorithms. This paper proposes a new approach to the updating problem in order to achieve a better performance and robustness of tracking. This is carried out using a representation technique based on second order isomorphisms. The proposed technique has been compared experimentally with other existing approaches with excellent results. The most important improvements of this approach is its parameter-free working, therefore no parameters have to be set up manually in order to tune the process. Besides, objects to be tracked can be rigid or deformable, the system is adapted automatic and robustly to any situation.

Cayetano Guerra, Mario Hernández, Antonio Domínguez, Daniel Hernández

Shape and Matching


Contour-Based Image Registration Using Mutual Information

Image registration is a problem that arises in many image processing applications whenever information from two or more scenes have to be aligned. In image registration the use of an adequate measure of alignment is a crucial issue. Current techniques are classified in two broad categories: area based and feature based. All methods include some similarity measure. In this paper a new measure that combines mutual information ideas, spatial information and feature characteristics, is proposed. Edge points are used as features, obtained from a Canny edge detector. Feature characteristics like location, edge strength and orientation are taken into account to compute a joint probability distribution of corresponding edge points in two images. Mutual information based on this function is minimized to find the best alignment parameters. The approach has been tested with a collection of portal images taken in real cancer treatment sessions, obtaining encouraging results.

Nancy A. Álvarez, José M. Sanchiz, Jorge Badenas, Filiberto Pla, Gustavo Casañ

Improving Correspondence Matching Using Label Consistency Constraints

In this paper we demonstrate how to embed label consistency constraints into point correspondence matching. We make two contributions. First, we show how the point proximity matrix can be incorporated into the support function for probabilistic relaxation. Second we show how the label probabilities delivered by relaxation labelling can be used to gate the kernel matrix for articulated point pattern matching. The method is evaluated on synthetic and real-world data, where the label compatibility process is demonstrated to improve the correspondence process.

Hongfang Wang, Edwin R. Hancock

The Euclidean Distance Transform Applied to the FCC and BCC Grids

The discrete Euclidean distance transform is applied to grids with non-cubic voxels, the face-centered cubic (fcc) and body-centered cubic (bcc) grids. These grids are three-dimensional generalizations of the hexagonal grid. Raster scanning and contour processing techniques are applied using different neighbourhoods. When computing the Euclidean distance transform, some voxel configurations produce errors. The maximum errors for the two different grids and neighbourhood sizes are analyzed and compared with the cubic grid.

Robin Strand

Matching Deformable Regions Using Local Histograms of Differential Invariants

This paper presents a technique to enable deformable regions to be matched using image databases based on the information provided by the differential invariants of local histograms for the key-region. We shall show how this technique is robust enough to deal with local deformations, viewpoint changes, lighting changes, large motions of the tracked object and small changes in image rotation and scale. The proposed algorithm is based on the building of a specific template where an orthogonal representation space is associated with each of its locations. This space is calculated from neighboring information provided by a vector of local invariants calculated on each of the image’s pixels. Unlike other well-known color-based techniques, this algorithm only uses the pixels’ gray level values.

Nicolás Pérez de la Blanca, José M. Fuertes, Manuel J. Lucena

A Global-to-Local Matching Strategy for Registering Retinal Fundus Images

In this paper, a multi-resolution rigid-model-based global matching algorithm is employed to register tree structures of blood vessels extracted from retinal fundus images. To further improve alignment of the vessels, a local structure-deformed elastic matching algorithm is proposed to eliminate the existence of ‘ghost vessels’ for accurate registration. The matching methods are tested on 268 pairs of retinal fundus images. Experiment results show that our global-to-local registration strategy is able to achieve an average centreline mapping errors of 1.85 pixels with average execution time of 207 seconds. The registration results have also been visually validated by corresponding fusion maps.

Xinge You, Bin Fang, Zhenyu He, Yuan Yan Tang

A Model-Based Method for Face Shape Recovery

In this paper we describe a model-based method for recovering the 3D shape of faces using shape-from-shading. Using range-data, we learn a statistical model of the variation in surface normal direction for faces. This model uses the azimuthal equidistant projection to represent the distribution of surface normal directions. We fit the model to intensity data using constraints on the surface normal direction provided by Lambert’s law. We illustrate the effectiveness of the method on real-world image data.

William A. P. Smith, Edwin R. Hancock

Visual Detection of Hexagonal Headed Bolts Using Method of Frames and Matching Pursuit

In this paper we focus on the problem of automatically detecting the absence of the fastening bolts that secure the rails to the sleepers. The proposed visual inspection system uses images acquired from a digital line scan camera installed under a train. The general performances of the system, in terms of speed and detection rate, are mainly influenced by the adopted features for representing images and by their number. In this paper we use overcomplete dictionaries of waveforms, called frames, which allow dense and sparse representations of images and analyze the performances of the system with respect to the sparsity of the representation. Sparse means a representation with only few no vanishing components. In particular we show that, in the case of Gabor dictionaries, dense representations provide the highest detection rate. Moreover, the number of no vanishing components of 1% of the total reduces of 10% the detection rate of the system, indicating that very sparse representations do not heavily influence the performances. We show the adopted techniques by using images acquired in real experimental conditions.

Pier Luigi Mazzeo, Ettore Stella, Nicola Ancona, Arcangelo Distante

A New Region-Based Active Contour for Object Extraction Using Level Set Method

Object extraction or image segmentation is a basic problem in image analysis and computer vision. It has been dealt with in various forms. Variational method is an emerging framework to tackle such problems where the aim is to create an image partition that follows the data while at the same time preserving certain regularity. In this paper, we propose a new energy functional which is based on the region information of an image. The region-based force makes our variational flow robust to noise and provides a global segmentation criterion. Furthermore, our method is implemented using level set theory, which makes it easy to deal with topological changes. Finally, in order to simultaneously segment a number of different objects in an image, a hierarchical method is presented.

Lishui Cheng, Jie Yang, Xian Fan

Improving ASM Search Using Mixture Models for Grey-Level Profiles

The use of Active Shape Models (ASM) has been shown to be an efficient approach to image interpretation and pattern recognition. In ASM, grey-level profiles at landmarks are modelled as a Gaussian distribution. Mahalanobis distance from a sample profile to the model mean is used to locate the best position of a given landmark during ASM search. We present an improved ASM methodology, in which the profiles are modelled as a mixture of Gaussians, and the probability that a sample is from the distribution is calculated using the probability density function (pdf) of the mixture model. Both improved and original ASM methods were tested on synthetic and real data. The performance comparison demonstrates that the improved ASM method is more generic and robust than the original approach.

Yanong Zhu, Mark Fisher, Reyer Zwiggelaar

Human Figure Segmentation Using Independent Component Analysis

In this paper, we present a Statistical Shape Model for Human Figure Segmentation in gait sequences. Point Distribution Models (PDM) generally use Principal Component analysis (PCA) to describe the main directions of variation in the training set. However, PCA assumes a number of restrictions on the data that do not always hold. In this work, we explore the potential of Independent Component Analysis (ICA) as an alternative shape decomposition to the PDM-based Human Figure Segmentation. The shape model obtained enables accurate estimation of human figures despite segmentation errors in the input silhouettes and has really good convergence qualities.

Grégory Rogez, Carlos Orrite-Uruñuela, Jesús Martínez-del-Rincón

Adaptive Window Growing Technique for Efficient Image Matching

The paper presents a new approach to image matching based on the developed adaptive window growing algorithm. This integer-only algorithm operates on monochrome images transformed into the Census nonparametric representation. It effectively computes the entropy of the local areas and adjusts their size if the entropy is not sufficient. This way the method allows for avoidance of featureless areas that cannot be reliably matched, at the same time maintaining the matching window as small as possible. The special stress has been also laid on efficient implementation that can fit the custom hardware architectures. Therefore the presented algorithm requires only an integer arithmetic. Many experiments with the presented technique applied to the stereovision matching showed its robustness and competing execution times.

Bogusław Cyganek

Managing Resolution in Digital Elevation Models Using Image Processing Techniques

In this work, we propose a set of algorithms to manage the resolution of DEM for simulation processes. First, we present an application to handle the huge quantity of data contained in DEM for real-time rendering by discriminating the less significant elevation data. On the other hand, as a second step of the process, we extend the algorithm to increase the spatial resolution of DEM for cases when it is needed. Finally, we introduce a method for increasing spectral resolution of DEM by using a skeletonization process. The algorithms were developed to be used with raster data sets, although similar considerations can be taken for vector data sets.

Rolando Quintero, Serguei Levachkine, Miguel Torres, Marco Moreno, Giovanni Guzman

Object Image Retrieval by Shape Content in Complex Scenes Using Geometric Constraints

This paper presents an image retrieval system based on 2D shape information. Query shape objects and database images are represented by polygonal approximations of their contours. Afterwards they are encoded, using geometric features, in terms of predefined structures. Shapes are then located in database images by a voting procedure on the spatial domain. Then an alignment matching provides a probability value to rank de database image in the retrieval result. The method allows to detect a query object in database images even when they contain complex scenes. Also the shape matching tolerates partial occlusions and affine transformations as translation, rotation or scaling.

Agnés Borràs, Josep Lladós

Image and Video Processing


A Real-Time Gabor Primal Sketch for Visual Attention

We describe a fast algorithm for Gabor filtering, specially designed for multi-scale image representations. Our proposal is based on three facts: first, Gabor functions can be decomposed in gaussian convolutions and complex multiplications which allows the replacement of Gabor filters by more efficient gaussian filters; second, isotropic gaussian filtering is implemented by separable 1D horizontal/vertical convolutions and permits a fast implementation of the non-separable zero-mean Gabor kernel; third, short FIR filters and the

à trous

algorithm are utilized to build a recursive multi-scale decomposition, which saves important computational resources. Our proposal reduces to about one half the number of operations with respect to state-of-the-art approaches.

Alexandre Bernardino, José Santos-Victor

Bayesian Reconstruction of Color Images Acquired with a Single CCD

Most of the available digital color cameras use a single Coupled Charge Device (


) with a Color Filter Array (


) in acquiring an image. In order to produce a visible color image a demosaicing process must be applied, which produces undesirable artifacts. This paper addresses the demosaicing problem from a superresolution point of view. Utilizing the Bayesian paradigm, an estimate of the reconstructed images and the model parameters is generated.

Miguel Vega, Rafael Molina, Aggelos K. Katsaggelos

A Fast and Exact Algorithm for Total Variation Minimization

This paper deals with the minimization of the total variation under a convex data fidelity term. We propose an algorithm which computes an exact minimizer of this problem. The method relies on the decomposition of an image into its level sets. Using these level sets, we map the problem into optimizations of independent binary Markov Random Fields. Binary solutions are found thanks to graph-cut techniques and we show how to derive a fast algorithm. We also study the special case when the fidelity term is the



-norm. Finally we provide some experiments.

Jérôme Darbon, Marc Sigelle

Phase Unwrapping via Graph Cuts

This paper presents a new algorithm for recovering the absolute phase from modulo-2


phase, the so-called phase unwrapping (PU) problem. PU arises as a key step in several imaging technologies, from which we emphasize interferometric SAR and SAS, where topography is inferred from absolute phase measurements between two (or more) antennas and the terrain itself. The adopted criterion is the minimization of the



norm of phase differences [1], [2], usually leading to computationally demanding algorithms. Our approach follows the idea introduced in [3] of an iterative binary optimization scheme, the novelty being the casting onto a graph max-flow/min-cut formulation, for which there exists efficient algorithms. That graph formulation is based on recent energy minimization results via graph-cuts [4]. Accordingly, we term this new algorithm PUMF (for phase unwrapping max-flow). A set of experimental results illustrates the effectiveness of PUMF.

José M. Bioucas-Dias, Gonçalo Valadão

A New Fuzzy Multi-channel Filter for the Reduction of Impulse Noise

One of the most common image processing tasks involves the removal of impulse noise from digital images. In this paper, we propose a new two step multi-channel filter. This new non-linear filter technique contains two separate steps: an impulse noise detection step and a noise reduction step. The fuzzy detection method is mainly based on the calculation of fuzzy gradient values and on fuzzy reasoning. This phase will determine three separate membership functions that will be used by the filtering step. Experiments prove that the proposed filter may be used for efficient removal of impulse noise from colour images without distorting the useful information in the image.

Stefan Schulte, Valérie De Witte, Mike Nachtegael, Dietrich Van der Weken, Etienne E. Kerre

Enhancement and Cleaning of Handwritten Data by Using Neural Networks

In this work, artificial neural networks are used to clean and enhance scanned images for a handwritten recognition task. Multilayer perceptrons are trained in a supervised way using a set of simulated noisy images together with the corresponding clean images for the desired output. The neural network acquires the function of a desired enhancing method. The performance of this method has been evaluated for both noisy artificial and natural images. Objective and subjective methods of evaluation have shown a superior performance of the proposed method over other conventional enhancing and cleaning filters.

José Luis Hidalgo, Salvador España, María José Castro, José Alberto Pérez

Zerotree Wavelet Based Image Quilting for Fast Texture Synthesis

In this paper we propose a fast DWT based multi-resolution texture synthesis algorithm in which coefficient blocks of the spatio-frequeny components of the input texture are efficiently stitched together (


) to form the corresponding components of the synthesised output texture. We propose the use of an automatically generated threshold to determine the significant coefficients which acts as elements of a matching template used in the texture quilting process. We show that the use of a limited set of, visually significant coefficients, regardless of their level of resolution, not only reduces the computational cost, but also results in more realistic texture synthesis. We use popular test textures to compare our results with that of the existing state-or-the-art techniques. Many application scenarios of the proposed algorithm are also discussed.

Dhammike S. Wickramanayake, Eran A. Edirisinghe, Helmut E. Bez

Semantic Feature Extraction Based on Video Abstraction and Temporal Modeling

This paper presents a novel scheme of object-based video indexing and retrieval based on video abstraction and semantic event modeling. The proposed algorithm consists of three major steps; Video Object (VO) extraction, object-based video abstraction and statistical modeling of semantic features. Semantic feature modeling scheme is based on temporal variation of low-level features in object area between adjacent frames of video sequence. Each semantic feature is represented by a Hidden Markov Model (HMM) which characterizes the temporal nature of VO with various combinations of object features. The experimental results demonstrate the effective performance of the proposed approach.

Kisung Lee

Video Retrieval Using an EDL-Based Timeline

In creating a new multimedia asset, specially a video, some decisions have to be made: a selection of the portions of the original footage that might be included, how to order them, how to crop each portion in order to reach the desired length and how to stitch all these pieces together. All these decisions constitute the core of the so called Editing Decision List, where all these actions are stored for the record. In this paper the authors show that the list of editing decisions can be used as the basis for indexing and retrieving videos from a database; more specifically, we show that a timeline created from the EDL is a valid and sufficient descriptor for identifying a video among a huge population, assuming a minimum duration. We demonstrate, as well, that this descriptor has a very good behavior in terms of robustness given different bit and frame rates, sizes and re-encoding processes. Indexing and retrieval using this descriptor is tested in a IPMP application for TV broadcasting.

José San Pedro, Nicolas Denis, Sergio Domínguez

Image and Video Coding


A New Secret Sharing Scheme for Images Based on Additive 2-Dimensional Cellular Automata

A new secret color image sharing scheme based on two-dimensional memory cellular automata, is proposed. Such protocol is of a (




)-threshold scheme where the secret image to be shared is considered as one of the initial configurations of the cellular automata. The original idea is to study how a reversible model of computation permits to compute the shares and then using the reverse computation in order to recover the original image. The scheme is proved to be perfect and ideal, and resistant to the most important attacks such as statistical attacks.

Gonzalo Álvarez Marañón, Luis Hernández Encinas, Ángel Martín del Rey

A Fast Motion Estimation Algorithm Based on Diamond and Triangle Search Patterns

Based on the study of patterns used in many fast algorithms for the block-matching motion estimation (BMME), a new search pattern, TP (Triangle Pattern), was introduced in this paper. TP is a simplified SP (Square Pattern), so it has almost the same performance as SP. By combining TP with DP, a fast BMA (BMME Algorithm), DTS (Diamond-Triangle Search), was also proposed in this paper. DTS well exploits the motion correlation between the adjacent blocks, the directional characteristic of SAD(Sum of Absolute Difference) distribution, and the center-biased characteristic of motion vectors to speed up the BMME. Experimental results show that the proposed DTS algorithm can reduce the computational complexity of the BMME remarkably while incurring little, if any, loss in quality.

Yun Cheng, Zhiying Wang, Kui Dai, Jianjun Guo

A Watermarking Scheme Based on Discrete Non-separable Wavelet Transform

This paper presents a novel method for constructing non-separable wavelet filters. The high frequency sub-bands of non-separable wavelet transform can reveal more features than that of the common used separable wavelet transform. Then, we describe a blind watermarking scheme which is based on discrete non-separable wavelet transform (DNWT). More coefficients of DNWT can add watermark than that of discrete separable wavelet transform (DSWT). Experiment results show that the DNWT watermarking scheme is robust to noising, JPEG compression, and cropping. Especially, it is more resistant to sharpening than DSWT scheme. Furthermore, by adjusting the threshold such that the number of the DSWT coefficients to embed watermark is not less than the number of the DNWT coefficients, the performance of DSWT to sharpening is still worse than the DNWT. Such adjustment also dramatically decreases the robustness of the DSWT scheme to noising.

Jianwei Yang, Xinge You, Yuan Yan Tang, Bin Fang

A Fast Run-Length Algorithm for Wavelet Image Coding with Reduced Memory Usage

A new image coder is described in this paper. Since it is based on the Discrete Wavelet Transform (DWT), it yields good Rate/Distortion (R/D) performance. However, our proposal focuses on overcoming the two main problems of wavelet-based image coders: they are typically implemented by memory-intensive and time-consuming algorithms. In order to avoid these common drawbacks, we ought to tackle these problems in the main stages of this type of coder, i.e., both the wavelet computation and the entropy coding of the coefficients. The proposed algorithms are described in such a manner that they can be implemented in any programming language straightforwardly. The numerical results show that while the R/D performance achieved by our proposal is similar to the state-of-the-art coders, such as SPIHT and JPEG2000/Jasper, the amount of memory required in our algorithm is reduced drastically (in the order of 25 to 35 times less memory), and its execution time is lower (three times lower than SPIHT, and more than ten times lower than JPEG 2000/Jasper).

Jose Oliver, Manuel P. Malumbres

Face Recognition


Multiple Face Detection at Different Resolutions for Perceptual User Interfaces

This paper describes in detail a real-time multiple face detection system for video streams. The system adds to the good performance provided by a window shift approach, the combination of different cues available in video streams due to temporal coherence. The results achieved by this combined solution outperform the basic face detector obtaining a 98% success rate for around 27000 images, providing additionally eye detection and a relation between the successive detections in time by means of detection threads.

Modesto Castrillón-Santana, Javier Lorenzo-Navarro, Oscar Déniz-Suárez, José Isern-González, Antonio Falcón-Martel

Removing Shadows from Face Images Using ICA

Shadows produce troublesome effects in many computer vision applications. The idea behind most current shadow removal approaches is locating shadows and then removing them[1][4]. However, distinguishing shadow edges due to shadows from reflectance edges due to reflectance changes is a difficult problem, particularly in a single image. In this paper, we focus on the shadow removal problem in face recognition, and take a novel method based on ICA (Independent Component Analysis) to remove shadows from a single face images. The training set contains face images without shadows. Firstly, we applied derivative filters on training images to derive face edge maps, and then perform ICA on filtered training set to construct pixel ICA subspaces which can be used to remove shadow edges from the filtered versions of a single test image. After the shadow edges removal process, a shadow free image can be reconstructed using an approach similar to [7]. Unlike previous shadow removal approaches, our method can remove shadows from a single gray image. Experimental results demonstrate that the proposed approach can effectively eliminate the effects of shadows in face recognition.

Jun Liu, Xiangsheng Huang, Yangsheng Wang

An Analysis of Facial Description in Static Images and Video Streams

This paper describes an analysis performed for facial description in static images and video streams. The still image context is first analyzed in order to decide the optimal classifier configuration for each problem: gender recognition, race classification, and glasses and moustache presence. These results are later applied to significant samples which are automatically extracted in real-time from video streams achieving promising results in the facial description of 70 individuals by means of gender, race and the presence of glasses and moustache.

Modesto Castrillón-Santana, Javier Lorenzo-Navarro, Daniel Hernández-Sosa, Yeray Rodríguez-Domínguez

Recognition of Facial Gestures Based on Support Vector Machines

This paper addresses the problem of recognition of emotional facial gestures from static images in thumbnail resolution. More experiments are presented, a holistic and two local approaches using SVM’s as classifier engines. The experimental results related to the application of our method are reported.

Attila Fazekas, István Sánta

Performance Driven Facial Animation by Appearance Based Tracking

We present a method that estimates high level animation parameters (muscle contractions, eye movements, eye lids opening, jaw motion and lips contractions) from a marker-less face image sequence. We use an efficient appearance-based tracker to stabilise images of upper (eyes and eyebrows) and lower (mouth) face. By using a set of stabilised images with known animation parameters, we can learn a re-animation matrix that allows us to estimate the parameters of a new image. The system is able to re-animate a 32 DOF 3D face model in real-time.

José Miguel Buenaposada, Enrique Muñoz, Luis Baumela

Color Distribution Tracking for Facial Analysis

In this paper we address the problem of real time object tracking in complex scenes under dynamically changing lighting conditions. This problem affects video-surveillance applications where object location must be known at any time. We are interested in locating and tracking people in video sequences for access control and advanced user interface applications. Here we present a real time tracking method suitable for human faces. A Skin Probability Image (SPI) is generated by applying a skin hue model to the input frame. Targets are located by applying a modified mean-shift algorithm. To obtain their spatial extent, error ellipses are fitted to the probability distributions representing them. The hue model is unique for each target and it is updated each frame to cope with lighting variations. This technique has been applied to human face tracking in indoor environments to test its performance in different situations.

Juan José Gracia-Roche, Carlos Orrite, Emiliano Bernués, José Elías Herrero

Head Gesture Recognition Based on Bayesian Network

Head gestures such as nodding and shaking are often used as one of human body languages for communication with each other, and their recognition plays an important role in the development of Human-Computer Interaction (HCI). As head gesture is the continuous motion on the sequential time series, the key problems of recognition are to track multi-view head and understand the head pose transformation. This paper presents a Bayesian network (BN) based framework, into which multi-view model (MVM) and the head gesture statistic inference model are integrated for recognizing. Finally the decision of head gesture is made by comparing the maximum posterior, the output of BN, with some threshold. Additionally, in order to enhance the robustness of our system, we add the color information into BN in a new way. The experimental results illustrate that the proposed algorithm is effective.

Peng Lu, Xiangsheng Huang, Xinshan Zhu, Yangsheng Wang

Detection and Tracking of Face by a Walking Robot

We propose a system for detection and tracking of face in dynamic and changing environments from a camera mounted on a walking robot. The proposed system is based on the principal component analysis (PCA) technique. For the detection of a face, first, we use a skin color information and motion information. Thereafter, we verify that the detected regions are indeed the face using the PCA technique. The tracking of a face is based on the Euclidian distance in eigenspace between the previously tracked face and the newly detected faces. Walking robot control for the face tracking is done in such a way that the detected face region is kept on the central region of the camera screen by controlling the robot motion. The proposed system is extensible to other walking robot systems and gesture recognition systems for human-robot interaction.

Do Joon Jung, Chang Woo Lee, Hang Joon Kim

Human Activity Analysis


Appearance-Based Recognition of Words in American Sign Language

In this paper, we present how appearance-based features can be used for the recognition of words in American sign language (ASL) from a video stream. The features are extracted without any segmentation or tracking of the hands or head of the signer, which avoids possible errors in the segmentation step. Experiments are performed on a database that consists of 10 words in ASL with 110 utterances in total. These data are extracted from a publicly available collection of videos and can therefore be used by other research groups. The video streams of two stationary cameras are used for classification, but we observe that one camera alone already leads to sufficient accuracy. Hidden Markov Models and the leaving one out method are employed for training and classification. Using the simple appearance-based features, we achieve an error rate of 7%. About half of the remaining errors are due to words that are visually different from all other utterances.

Morteza Zahedi, Daniel Keysers, Hermann Ney

Robust Person-Independent Visual Sign Language Recognition

Sign language recognition constitutes a challenging field of research in computer vision. Common problems like overlap, ambiguities, and minimal pairs occur frequently and require robust algorithms for feature extraction and processing. We present a system that performs person-dependent recognition of 232 isolated signs with an accuracy of 99.3% in a controlled environment. Person-independent recognition rates reach 44.1% for 221 signs. An average performance of 87.8% is achieved for six signers in various uncontrolled indoor and outdoor environments, using a reduced vocabulary of 18 signs.

The system uses a background model to remove static areas from the input video on pixel level. In the tracking stage, multiple hypotheses are pursued in parallel to handle ambiguities and facilitate retrospective correction of errors. A winner hypothesis is found by applying high level knowledge of the human body, hand motion, and the signing process. Overlaps are resolved by template matching, exploiting temporally adjacent frames with no or less overlap. The extracted features are normalized for person-independence and robustness, and classified by Hidden Markov Models.

Jörg Zieren, Karl-Friedrich Kraiss

A 3D Dynamic Model of Human Actions for Probabilistic Image Tracking

In this paper we present a method suitable to be used for human tracking as a

temporal prior

in a particle filtering framework such as CONDENSATION [5]. This method is for predicting feasible human postures given a reduced set of previous postures and will drastically reduce the number of particles needed to track a generic high-articulated object. Given a sequence of preceding postures, this example-driven transition model probabilistically matches the most likely postures from a database of human actions. Each action of the database is defined within a PCA-like space called


suitable to perform the probabilistic match when searching for similar sequences. So different, but feasible postures of the database become the new predicted poses.

Ignasi Rius, Daniel Rowe, Jordi Gonzàlez, Xavier Roca

Extracting Motion Features for Visual Human Activity Representation

This paper presents a technique to characterize human actions in visual surveillance scenarios in order to describe, in a qualitative way, basic human movements in general imaging conditions. The representation proposed is based on focus of attention concepts, as part of an active tracking process to describe target movements. The introduced representation, named “focus of attention” representation, FOA, is based on motion information. A segmentation method is also presented to group the FOA in uniform temporal segments. The segmentation will allow providing a higher level description of human actions, by means of further classifying each segment in different types of basic movements.

Filiberto Pla, Pedro Ribeiro, José Santos-Victor, Alexandre Bernardino

Modelling Spatial Correlation and Image Statistics for Improved Tracking of Human Gestures

In this paper, we examine sensor specific distributions of local image operators (edge and line detectors), which describe the appearance of people in video sequences. The distributions are used to describe a probabilistic articulated motion model to track the gestures of a person in terms of arms and body movement, which is solved using a particle filter. We focus on modeling the statistics of one sensor and examine the influence of image noise and scale, and the spatial accuracy that is obtainable. Additionally spatial correlation between pixels is modeled in the appearance model. We show that by neglecting the correlation high detection probabilities are quickly overestimated, which can often lead to false positives. Using the weighted geometric mean of pixel information leads to much improved results.

Rik Bellens, Sidharta Gautama, Johan D’Haeyer

Fast and Accurate Hand Pose Detection for Human-Robot Interaction

Enabling natural human-robot interaction using computer vision based applications requires fast and accurate hand detection. However, previous works in this field assume different constraints, like a limitation in the number of detected gestures, because hands are highly complex objects difficult to locate. This paper presents an approach which integrates temporal coherence cues and hand detection based on wrists using a cascade classifier. With this approach, we introduce three main contributions: (1) a transparent initialization mechanism without user participation for segmenting hands independently of their gesture, (2) a larger number of detected gestures as well as a faster training phase than previous cascade classifier based methods and (3) near real-time performance for hand pose detection in video streams.

Luis Antón-Canalís, Elena Sánchez-Nielsen, Modesto Castrillón-Santana



Performance Analysis of Homomorphic Systems for Image Change Detection

Under illumination variations image change detection becomes a difficult task. Some existing image change detection methods try to compensate this effect. It is assumed that an image can be expressed in terms of its illumination and reflectance components. Detection of changes in the reflectance component is directly related to scene changes. In general, scene illumination varies slowly over space, whereas the reflectance component contains mainly spatially high frequency details. The intention is to apply the image change detection algorithm to the reflectance component only. The aim of this work is to analyze the performance of different homomorphic pre-filtering schemes for extracting the reflectance component so that the image change detection algorithm is applied only to this component. This scheme is not suitable for scenes without spatial high frequency details.

Gonzalo Pajares, José Jaime Ruz, Jesús Manuel de la Cruz

Car License Plates Extraction and Recognition Based on Connected Components Analysis and HMM Decoding

A system for finding and recognizing car license plates is presented. The finding of the plates is based on the analysis of connected components of four different binarizations of the image. No assumptions are made about illumination and camera angle, and only mild assumptions regarding the size of the plate in the image are made. Recognition is performed by means of Hidden Markov Models. Experiments on a database of Spanish number plates show the feasibility of the proposed approach.

David Llorens, Andrés Marzal, Vicente Palazón, Juan M. Vilar

Multi-resolution Image Analysis for Vehicle Detection

Computer Vision can provide a great deal of assistance to Intelligent Vehicles. In this paper an Advanced Driver Assistance Systems for Vehicle Detection is presented. A geometric model of the vehicle is defined where its energy function includes information of the shape and symmetry of the vehicle and the shadow it produces. A genetic algorithm finds the optimum parameter values. As the algorithm receives information from a road detection module some geometric restrictions can be applied. A multi-resolution approach is used to speed up the algorithm and work in realtime. Examples of real images are shown to validate the algorithm.

Cristina Hilario, Juan Manuel Collado, José Maria Armingol, Arturo de la Escalera

A Novel Adaptive Gaussian Mixture Model for Background Subtraction

Background subtraction is a typical approach to foreground segmentation by comparing each new frame with a learned model of the scene background in image sequences taken from a static camera. In this paper, we propose a flexible method to estimate the background model with the finite Gaussian mixture model. A stochastic approximation procedure is used to recursively estimate the parameters of the Gaussian mixture model, and to simultaneously obtain the asymptotically optimal number of the mixture components. The experimental results show our method is efficient and effective.

Jian Cheng, Jie Yang, Yue Zhou

Intelligent Target Recognition Based on Wavelet Adaptive Network Based Fuzzy Inference System

In this paper, an intelligent target recognition system is presented for target recognition from target echo signal of High Resolution Range (HRR) radars. This paper especially deals with combination of the feature extraction and classification from measured real target echo signal waveforms using X –band pulse radar. Because of this, a wavelet adaptive network based fuzzy inference systemmodel developed by us is used. The model consists of two layers: wavelet and adaptive network based fuzzy inference system. The wavelet layer is used for adaptive feature extraction in the time-frequency domain and is composed of wavelet decomposition and wavelet entropy. The used for classification is an adaptive network based fuzzy inference system. The performance of the developed system has been evaluated in noisy radar target echo signals. The test results showed that this system was effective in detecting real radar target echo signals. The correct classification rate was about 93% for used target subjects.

Engin Avci, Ibrahim Turkoglu, Mustafa Poyraz



HMM-Based Gesture Recognition for Robot Control

In this paper, we present a gesture recognition system for an interaction between a human being and a robot. To recognize human gesture, we use a hidden Markov model (HMM) which takes a continuous stream as an input and can automatically segments and recognizes human gestures. The proposed system is composed of three modules: a pose extractor, a gesture recognizer, and a robot controller. The pose extractor replaces an input frame by a pose symbol. In this system, a pose represents the position of user’s face and hands. Thereafter the gesture recognizer recognizes a gesture using a HMM, which performs both segmentation and recognition of the human gesture simultaneously [6]. Finally, the robot controller handles the robot as transforming the recognized gesture into robot commands. To assess the validity of the proposed system, we used the proposed recognition system as an interface to control robots,


robot. The experimental results verify the feasibility and validity of the proposed system.

Hye Sun Park, Eun Yi Kim, Sang Su Jang, Se Hyun Park, Min Ho Park, Hang Joon Kim

PCA Positioning Sensor Characterization for Terrain Based Navigation of UVs

Principal Component Analysis has been recently proposed as a nonlinear positioning sensor in the development of tools for Terrain Based Navigation of Underwater Vehicles [10]. In this work the error sources affecting the proposed unsupervised methodology will be enumerated, the stochastic characterization will be studied, and the attainable performance will be discussed. Based on a series of Monte Carlo experiments for a large set of synthesized terrains, conclusions will be drawn on the adequacy of the proposed nonlinear approach.

Paulo Oliveira

Monte Carlo Localization Using SIFT Features

The ability of finding its situation in a given environment is crucial for an autonomous agent. While navigating through a space, a mobile robot must be capable of finding its location in a map of the environment (i.e. its pose < 






>), otherwise, the robot will not be able to complete its task. This problem becomes specially challenging if the robot does not possess any external measure of its global position. Typically, dead-reckoning systems do fail in the estimation of robot’s pose when working for long periods of time. In this paper we present a localization method based on the Monte Carlo algorithm. During the last decade this method has been extensively tested in the field of mobile Robotics, proving to be both robust and efficient. On the other hand, our approach takes advantage from the use of a vision sensor. In particular, we have chosen to use SIFT features as visual landmarks finding them suitable for the global localization of a mobile robot. We have succesfully tested our approach in a B21r mobile robot, achieving to globally localize the robot in few iterations. The technique is suitable for office-like environments and behaves correctly in the presence of people and moving objects.

Arturo Gil, Óscar Reinoso, Asunción Vicente, César Fernández, Luis Payá

A New Method for the Estimation of the Image Jacobian for the Control of an Uncalibrated Joint System

This paper describes various innovative algorithms for the on-line estimation of the image jacobian, a matrix which linearly relates joint velocity and image feature velocity. We have applied them successfully to the static visual control of an uncalibrated 3 DOF joint system, using two weakly calibrated fixed cameras. The proposed algorithms prove to be particularly robust when image features are calculated with an average level of noise, and our results are clearly better than those obtained for already existing algorithms in specialized literature.

Jose M. Sebastián, Lizardo Pari, Carolina González, Luis Ángel

Accelerometer Based Gesture Recognition Using Continuous HMMs

This paper presents a gesture recognition system based on continuous hidden Markov models. Gestures here are hand movements which are recorded by a 3D accelerometer embedded in a handheld device. In addition to standard hidden Markov model classifier, the recognition system has a preprocessing step which removes the effect of device orientation from the data. The performance of the recognizer is evaluated in both user dependent and user independent cases. The effects of sample resolution and sampling rate are studied in the user dependent case.

Timo Pylvänäinen

An Approach to Improve Online Hand-Eye Calibration

Online implementation of robotic hand-eye calibration consists in determining the relative pose between the robot gripper/end-effector and the sensors mounted on it, as the robot makes unplanned movement. With noisy measurements, inevitable in real applications, the calibration is sensitive to small rotations. Moreover, degenerate cases such as pure translations are of no effect in hand-eye calibration. This paper proposes an algorithm of motion selection for hand-eye calibration. Using this method, not only can we avoid the degenerate cases, but also the small rotations to decrease the calibration error. Thus, the procedure lends itself to an online implementation of hand-eye calibration, where degenerate cases and small rotations frequently occur in the sampled motions. Simulation and real experiments validate our method.

Fanhuai Shi, Jianhua Wang, Yuncai Liu

Hardware Architectures


Image Processing Application Development: From Rapid Prototyping to SW/HW Co-simulation and Automated Code Generation

Nowadays, the market-place offers quite powerful and low cost reconfigurable hardware devices and a wide range of software tools which find application in the image processing field. However, most of the image processing application designs and their latter deployment on specific hardware devices is still carried out quite costly by hand. This paper presents a new approach to image processing application development, which tackles the historic question of how filling the gap existing between rapid throwaway software designs and final software/hardware implementations. A new graphical component-based tool has been implemented which allows to comprehensively develop this kind of applications, from functional and architectural prototyping stages to software/hardware co-simulation and final code generation. Building this tool has been possible thanks to the synergy that arises from the integration of several of the pre-existent software and hardware image processing libraries and tools.

Cristina Vicente-Chicote, Ana Toledo, Pedro Sánchez-Palma

Xilinx System Generator Based HW Components for Rapid Prototyping of Computer Vision SW/HW Systems

This paper shows how the Xilinx System Generator can be used to develop hardware-based computer vision algorithms from a system level approach without the necessity of in-depth knowing neither a hardware description language nor the particulars of the hardware platform. Also, it is demonstrated that Simulink can be employed as a co-design and co-simulation platform for rapid prototyping of Computer Vision HW/SW systems. To do this, a library of optimized image processing components based on XSG and Matlab has been developed and tested in hybrid schemes including HW and SW modules. As a part of the testing, results of the prototyping and co-simulation of a HW/SW Computer Vision System for the automated inspection of tangerine segments are presented.

Ana Toledo, Cristina Vicente-Chicote, Juan Suardíaz, Sergio Cuenca

2-D Discrete Cosine Transform (DCT) on Meshes with Hierarchical Control Modes

An effective matrix operation is critical to process 2-D DCT. This paper presents a hierarchically controlled SIMD array (HCSA) well suited to matrix computations, in which a conventional 2-D torus is enhanced with the hierarchical organization of control units and the global data buses running across the rows and columns. The distinguished features of the HCSA are the diagonally indexed concurrent broadcast and the efficient data exchanges among PEs through either row or column broadcast. Therefore, the HCSA can provide significant improvement on computation steps of DCT. For the performance evaluation, an algorithmic mapping method is used and the number of computation steps is analytically compared with semisystolic architecture.

Cheong-Ghil Kim, Su-Jin Lee, Shin-Dug Kim

Domain-Specific Codesign for Automated Visual Inspection Systems

In this paper we present a codesign methodology for high-performance Automated Visual Inspection systems (AVIs). The proposal consists in reference hardware/software architecture and its associated co-verification environment. The codesign method is stepwise refinement-based process that starts with a preliminary hw/sw partition based on the reference architecture. During refinement the selected hardware blocks are coded using the high level language Handel-C, and the rest of the system using a p


library. This library allows to model different external components to hardware (software, external devices, etc...) with a behavioural, timing and performance view using software languages like C/C++. As a result of this design flow, we are able to verify and develop AVI systems with a significant improvement on traditional hardware/software codesign times.

Sergio Cuenca, Antonio Cámara, Juan Suardíaz, Ana Toledo

Hardware-Accelerated Template Matching

In the last decade, consumer graphics cards have increased their power because of the computer games industry. These cards are now programmable and capable of processing huge amounts of data in a SIMD fashion. In this work, we propose an alternative implementation of a very intuitive and well known 2D template matching, where the most computationally expensive task is accomplished by the graphics hardware processor. This computation approach is not new, but in this work we resume the method step-by-step to better understand the underlying complexity. Experimental results show an extraordinary performance trade-off, even working with obsolete hardware.

Raúl Cabido, Antonio S. Montemayor, Ángel Sánchez


Weitere Informationen

Premium Partner