Skip to main content

Über dieses Buch

This book constitutes the thoroughly refereed proceedings of the 14th International Conference on Advanced Concepts for Intelligent Vision Systems, ACIVS 2012, held in Brno, Czech Republic, in September 2012. The 46 revised full papers were carefully selected from 81 submissions and deal with image analysis and computer vision with a focus on detection, recognition, tracking and identification.



3D, Optics, and Light

System Identification: 3D Measurement Using Structured Light System

The problem of 3D reconstruction from 2D captured images is solved using a set of cocentric circular light patterns. Once the number of light sources and cameras, their location and the orientations, and the sampling density (the number of circular patterns) are determined, we propose a novel approach to representation of the reconstruction problem as system identification. Akin to system identification using the relationship between input and output, to develop an efficient 3D functional camera system, we identify the reconstruction system by choosing / defining input and output signals appropriately. One algorithm states that an input and an output are defined as projected circular patterns and 2D captured image (overlaid with deformed circular patterns) respectively. Another one is that a 3D target and the captured 2D image are defined as the input and the output respectively, leading to a problem of input estimation by demodulating an output (received) signal. The former approach identifies the system from the ratio of output to input, and is akin to a modulation-demodulation theory, the latter identifies the reconstruction system by estimating the input signal. This paper proposes the approach to identification of reconstruction system, and also substantiates the algorithm by showing results using inexpensive and simple experimental setup.

Deokwoo Lee, Hamid Krim

Gradual Iris Code Construction from Close-Up Eye Video

This work deals with dynamic iris biometry using video, which is increasingly gaining interest for its flexibility in the framework of biometric portals. We propose several improvements for “real-time” dynamic iris biometry in order to build gradually an iris code of high quality by selecting on-the-fly the best iris images as they appear during acquisition. In particular, tracking is performed using an optimally-tuned Kalman’s filter,


a Kalman’s filter with state and observation matrices specifically learned to follow the movement of a pupil. Experiments on four videos acquired with an IR-sensitive low-cost webcam show reduced computation time with a slight but significant gain in accuracy when compared to the classical Kalman tracker.

The second main contribution is to combine iris codes of images within the video stream providing the “best quality” iris texture. The so-obtained fuzzy iris codes clearly exhibit areas with high confidence and areas with low one due to eyelashes and eyelids. Hence, these areas involve an imprecision in detecting iris and pupil. Such uncertainty can be further exploited for identification.

Valérian Némesin, Stéphane Derrode, Amel Benazza-Benyahia

Depth from Vergence and Active Calibration for Humanoid Robots

In human eyes, many clues are used to perceive depth. For nearby tasks involving eye-hand coordination, depth from vergence is a strong cue. In our research on humanoid robots we study binocular robotic eyes that can pan and tilt and perceive depth from stereo, as well as depth from vergence by fixing both eyes on a nearby object. In this paper, we report on a convergent robot vision set-up: Firstly, we describe the mathematical model for convergent vision system. Secondly, we introduce an algorithm to estimate the depth of an object under focus. Thirdly, as the centers of rotation of the eye motors do not align with the center of image planes, we develop an active calibration algorithm to overcome this problem. Finally, we examine the factors that have impact on the depth error. The results of experiments and tests show the good performance of our system and provide insight into depth from vergence.

Xin Wang, Boris Lenseigne, Pieter Jonker

Information-Gain View Planning for Free-Form Object Reconstruction with a 3D ToF Camera

Active view planning for gathering data from an unexplored 3D complex scenario is a hard and still open problem in the computer vision community. In this paper, we present a general task-oriented approach based on an information-gain maximization that easily deals with such a problem. Our approach consists of ranking a given set of possible actions, based on their task-related gains, and then executing the best-ranked action to move the required sensor.

An example of how our approach behaves is demonstrated by applying it over 3D raw data for real-time volume modelling of complex-shaped objects. Our setting includes a calibrated 3D time-of-flight (ToF) camera mounted on a 7 degrees of freedom (DoF) robotic arm. Noise in the sensor data acquisition, which is too often ignored, is here explicitly taken into account by computing an uncertainty matrix for each point, and refining this matrix each time the point is seen again. Results show that, by always choosing the most informative view, a complete model of a 3D free-form object is acquired and also that our method achieves a good compromise between speed and precision.

Sergi Foix, Simon Kriegel, Stefan Fuchs, Guillem Alenyà, Carme Torras

Hardware Mapping

DSP Embedded Smart Surveillance Sensor with Robust SWAD-Based Tracker

Smart video analytics algorithms can be embedded within surveillance sensors for fast in-camera processing. This paper presents a DSP embedded video analytics system for object and people tracking, using a PTZ camera. The tracking algorithm is based on adaptive template matching and it employs a novel Sum of Weighted Absolute Differences. The video analytics is implemented on the DSP board DM6437 EVM and it automatically controls the PTZ camera, to keep the target central to the field of view. The EVM is connected to the network and the tracking algorithm can be remotely activated, so that the PTZ enhanced with the DSP embedded video analytics becomes a smart surveillance sensor. The system runs in real-time and simulation results demonstrate that the described SWAD outperforms other template matching measures in terms of efficiency and accuracy.

Gaetano Di Caterina, Iain Hunter, John J. Soraghan

GPU Optimization of Convolution for Large 3-D Real Images

In this paper, we propose a method for computing convolution of large 3-D images with respect to real signals. The convolution is performed in a frequency domain using a convolution theorem. Due to properties of real signals, the algorithm can be optimized so that both time and the memory consumption are halved when compared to complex signals of the same size. Convolution is decomposed in a frequency domain using the decimation in frequency (DIF) algorithm. The algorithm is accelerated on a graphics hardware by means of the CUDA parallel computing model, achieving up to 10× speedup with a single GPU over an optimized implementation on a quad-core CPU.

Pavel Karas, David Svoboda, Pavel Zemčík

Modified Bilateral Filter for the Restoration of Noisy Color Images

In the paper a novel technique of noise removal in color images is presented. The proposed filter design is a modification of the bilateral denosing scheme, which considers the similarity of color pixels and their spatial distance. However, instead of direct calculation of the dissimilarity measure, the cost of a connection through a digital path joining the central pixel of the filtering window and its neighbors is determined. The filter output, like in the standard bilateral filter, is calculated as a weighted average of the pixels which are in the neighborhood relation with the center of the filtering window, and the weights are functions of the minimal connection costs. Experimental results prove that the new denoising method yields significantly better results than the bilateral filter in case of color images contaminated by strong mixed Gaussian and impulsive noise.

Krystyna Malik, Bogdan Smolka

Correction, Stitching and Blur Estimation of Micro-graphs Obtained at High Speed

Micro-structures of surface are considered to be effective in identifying the damage mechanisms. The industry uses computer vision to auto detect misalignment of the components as it is a contactless tool. However, in scientific investigations micro structures obtained online at high-speed has to be analyzed. In this work the change detection of a specimen rotating at a high speed studied online using image processing techniques in micro graphs which provides a clear insight about the dimensional changes. The specimen under study is made from polymer composite which has contact with a steel wheel and rotates at a high speed. The blur as a measure of dimensional change of the polymer composite can be identified due to the change in focus. The micro-structure images were dark and span a very small region of the surface due to high speed image acquisition, short shutter time and magnification of the microscope. Thus, pre-processing procedures like image enhancement, stitching and registration are performed. Then 15 blur estimation methods are applied to the stitched images. The results of three methods present a good correlation with dimensional change provided by a stylus instrument.

Seyfollah Soleimani, Jacob Premkumar Sukumaran, Koen Douterloigne, Filip Rooms, Wilfried Philips, Patrick De Baets

Hardware Implementation of a Configurable Motion Estimator for Adjusting the Video Coding Performances

Despite the diversity of video compression standard, the motion estimation still remains a key process which is used in most of them. Moreover, the required coding performances (bit-rate, PSNR, image spatial resolution, etc.) depend obviously of the application, the environment and the network communication. The motion estimation can therefore be adapted to fit with these performances. Meanwhile, the real time encoding is required in many applications. In order to reach this goal, we propose in this paper a hardware implementation of the motion estimator which enables the integer motion search algorithms to be modified and the fractional search and variable block size to be selected and adjusted. Hence this novel architecture, especially designed for FPGA targets, proposes high-speed processing for a configuration which supports the variable size blocks and quaterpel refinement, as described in H.264.

Wajdi Elhamzi, Julien Dubois, Johel Miteran, Mohamed Atri, Rached Tourki

Quality and Documents

Quality Assurance for Document Image Collections in Digital Preservation

Maintenance of digital image libraries requires to frequently asses the quality of the images to engage preservation measures if necessary. We present an approach to image based quality assurance for digital image collections based on local descriptor matching. We use spatially distinctive local keypoints of contrast enhanced images and robust symmetric descriptor matching to calculate affine transformations for image registration. Structural similarity of aligned images is used for quality assessment. The results show, that our approach can efficiently asses the quality of digitized documents including images of blank paper.

Reinhold Huber-Mörk, Alexander Schindler

The Sampling Pattern Cube – A Representation and Evaluation Tool for Optical Capturing Systems

Knowledge about how the light field is sampled through a camera system gives the required information to investigate interesting camera parameters. We introduce a simple and handy model to look into the sampling behavior of a camera system. We have applied this model to single lens system as well as plenoptic cameras. We have investigated how camera parameters of interest are interpreted in our proposed model-based representation. This model also enables us to make comparisons between capturing systems or to investigate how variations in an optical capturing system affect its sampling behavior.

Mitra Damghanian, Roger Olsson, Mårten Sjöström

Improving Image Acquisition: A Fish-Inspired Solution

In this paper, we study the rendering of images with a new mosaic/color filter array (CFA) called the Burtoni mosaic. This mosaic is derived from the retina of the African cichlid fish

Astatotilapia burtoni

. To evaluate the effect of the Burtoni mosaic on the quality of the rendered images, we use two quality measures in the Fourier domain which are the resolution error and the aliasing error. In our model, no demosaicing algorithm is used, which makes it independent of such algorithms. We also use 11 semantic sets of color images in order to highlight the images classes that are well fitted for the Burtoni mosaic in the process of image acquisition. We have compared the Burtoni mosaic with the Bayer CFA and with an optimal CFA proposed by Hao

et al

. Experiments have shown that the Burtoni mosaic gives the best performances for images of 9 semantic sets which are the high frequency, aerial, indoor, face, aquatic, bright, dark, step and line classes.

Julien Couillaud, Alain Horé, Djemel Ziou

Evaluating the Effects of MJPEG Compression on Motion Tracking in Metro Railway Surveillance

Video content analytics is being increasingly employed for the security surveillance of mass-transit systems. The growing number of cameras, the presence of legacy networks, the limited bandwidth of wireless links, are some of the issues which highlight the importance of evaluating the performance of motion tracking against different levels of video compression. In this paper, we report the results of such an evaluation considering false-negative and false-positive metrics applied to videos captured from cameras installed in a real metro-railway environment. The evaluation methodology is based on the manual generation of the Ground Truth on selected videos at growing levels of MJPEG compression, and on its comparison with the Algorithm Result automatically generated by the Motion Tracker. The computation of reference performance metrics is automated by a tool developed in Matlab. Results are discussed with respect to the main causes of false detections, and hints are provided for further industrial applications.

Angelo Cozzolino, Francesco Flammini, Valentina Galli, Mariangela Lamberti, Giovanni Poggi, Concetta Pragliola

Annotating Images with Suggestions — User Study of a Tagging System

This paper explores the concept of image-wise tagging. It introduces a web-based user interface for image annotation, and a novel method for modeling dependencies of tags using Restricted Boltzmann Machines which is able to suggest probable tags for an image based on previously assigned tags. According to our user study, our tag suggestion methods improve both user experience and annotation speed. Our results demonstrate that large datasets with semantic labels (such as in TRECVID Semantic Indexing) can be annotated much more efficiently with the proposed approach than with current class-domain-wise methods, and produce higher quality data.

Michal Hradiš, Martin Kolář, Aleš Láník, Jiří Král, Pavel Zemčík, Pavel Smrž

Segmentation, Decomposition and Surface

Cross-Channel Co-occurrence Matrices for Robust Characterization of Surface Disruptions in 21/2D Rail Image Analysis

We present a new robust approach to the detection of rail surface disruptions in high-resolution images by means of 2




D image analysis. The detection results are used to determine the condition of rails as a precaution to avoid breaks and further damage. Images of rails are taken with color line scan cameras at high resolution of about 0.2 millimeters under specific illumination to enable 2




D image analysis. Pixel locations fulfilling the anti-correlation property between two color channels are detected and integrated over regions of general background deviations using so called cross-channel co-occurrence matrices, a novel variant of co-occurrence matrices introduced as part of this work. Consequently, the detection of rail surface disruptions is achieved with high precision, whereas the unintentional elimination of valid detections in the course of false and irrelevant detection removal is reduced. In this regard, the new approach is more robust than previous methods.

Daniel Soukup, Reinhold Huber-Mörk

Improving HOG with Image Segmentation: Application to Human Detection

In this paper we improve the

histogram of oriented gradients

(HOG), a core descriptor of state-of-the-art object detection, by the use of higher-level information coming from image segmentation. The idea is to re-weight the descriptor while computing it without increasing its size. The benefits of the proposal are two-fold: (i) to improve the performance of the detector by enriching the descriptor information and (ii) take advantage of the information of image segmentation, which in fact is likely to be used in other stages of the detection system such as candidate generation or refinement.

We test our technique in the INRIA person dataset, which was originally developed to test HOG, embedding it in a human detection system. The well-known segmentation method, mean-shift (from smaller to larger super-pixels), and different methods to re-weight the original descriptor (constant, region-luminance, color or texture-dependent) has been evaluated. We achieve performance improvements of 4.47% in detection rate through the use of differences of color between contour pixel neighborhoods as re-weighting function.

Yainuvis Socarrás Salas, David Vázquez Bermudez, Antonio M. López Peña, David Gerónimo Gomez, Theo Gevers

A Supervised Learning Framework for Automatic Prostate Segmentation in Trans Rectal Ultrasound Images

Heterogeneous intensity distribution inside the prostate gland, significant variations in prostate shape, size, inter dataset contrast variations, and imaging artifacts like shadow regions and speckle in Trans Rectal Ultrasound (TRUS) images challenge computer aided automatic or semi-automatic segmentation of the prostate. In this paper, we propose a supervised learning schema based on random forest for automatic initialization and propagation of statistical shape and appearance model. Parametric representation of the statistical model of shape and appearance is derived from principal component analysis (PCA) of the probability distribution inside the prostate and PCA of the contour landmarks obtained from the training images. Unlike traditional statistical models of shape and intensity priors, the appearance model in this paper is derived from the posterior probabilities obtained from random forest classification. This probabilistic information is then used for the initialization and propagation of the statistical model. The proposed method achieves mean Dice Similarity Coefficient (DSC) value of 0.96±0.01, with a mean segmentation time of 0.67±0.02 seconds when validated with 24 images from 6 datasets with considerable shape, size, and intensity variations, in a leave-one-patient-out validation framework. The model achieves statistically significant




-value<0.0001 in mean DSC and mean mean absolute distance (MAD) values compared to traditional statistical models of shape and intensity priors.

Soumya Ghose, Jhimli Mitra, Arnau Oliver, Robert Martí, Xavier Lladó, Jordi Freixenet, Joan C. Vilanova, Josep Comet, Désiré Sidibé, Fabrice Meriaudeau

Simultaneous Segmentation and Filtering via Reduced Graph Cuts

Recently, optimization with graph cuts became very attractive but generally remains limited to small-scale problems due to the large memory requirement of graphs, even when restricted to binary variables. Unlike previous heuristics which generally fail to fully capture details, [8] proposes another band-based method for reducing these graphs in image segmentation. This method provides small graphs while preserving thin structures but do not offer low memory usage when the amount of regularization is large. This is typically the case when images are corrupted by an impulsive noise. In this paper, we overcome this situation by embedding a new parameter in this method to both further reducing graphs and filtering the segmentation. This parameter avoids any post-processing steps, appears to be generally less sensitive to noise variations and offers a good robustness against noise. We also provide an empirical way to automatically tune this parameter and illustrate its behavior for segmenting grayscale and color images.

Nicolas Lermé, François Malgouyres

Rectangular Decomposition of Binary Images

The contribution deals with the most important methods for decomposition of binary images into union of rectangles. The overview includes run-length encoding and its generalization, decompositions based on quadtrees, on the distance transformation, and a theoretically optimal decomposition based on maximal matching in bipartite graphs. We experimentally test their performance in binary image compression and in convolution calculation and compare their computation times and success rates.

Tomáš Suk, Cyril Höschl, Jan Flusser

A New Level-Set Based Algorithm for Bimodal Depth Segmentation

In this paper, a new algorithm for bimodal depth segmentation is presented. The method separates the background and the planar objects of arbitrary shapes lying in a certain height above the background using the information from the stereo image pair (more exactly, the background and the objects may lie on two distinct general planes). The problem is solved as a problem of minimising a functional. A new functional is proposed for this purpose that is based on evaluating the mismatches between the images, which contrasts with the usual approaches that evaluate the matches. We explain the motivation for such an approach. The minimisation is carried out by making use of the Euler-Lagrange equation and the level-set function. The experiments show the promising results on noisy synthetic images as well as on real-life images. An example of the practical application of the method is also presented.

Michal Krumnikl, Eduard Sojka, Jan Gaura

3D Shape from Focus Using LULU Operators

Extracting the shape of an object is one of the important tasks to be performed in many vision applications. One of the difficult challenges in 3D shape extraction is the roughness of the surfaces of objects. Shape from focus (SFF) is a shape recovery method that reconstructs the shape of an object from a sequence of images taken from the same viewpoint but with different focal lengths. This paper proposes the use of LULU operators as a preprocessing step to improve the signal-to-noise ratio in the estimation of 3D shape from focus. LULU operators are morphological filters that are used for their structure preserving properties. The proposed technique is tested on simulated and real images separately, as well as in combination with traditional SFF methods such as sum modified Laplacian (SML), and gray level variance (GLV). The proposed technique is tested in the presence of impulse noise with different noise levels. Based on the quantitative and qualitative experimental results it is shown that the proposed techniques is more accurate in focus value extraction and shape recovery in the presence of noise.

Roushanak Rahmat, Aamir Saeed Mallik, Nidla Kamel, Tae-Sun Choi, Monson H. Hayes

Feature Extraction and Classification

Overlapping Local Phase Feature (OLPF) for Robust Face Recognition in Surveillance

As a non-invasive biometric method, face recognition in surveillance is a very challenging problem because of the concurrence of conditions, such as under the variable illumination with uncontrolled pose and movement in low-resolution of subject. In this paper, we present a robust human face recognition system for surveillance. Unlike traditional recognition system which detect face region directly, we use a Cascade Head-Shoulder Detector (CHSD) and a trained human body model to find the face region in an image. To recognize human face, an efficient feature, Overlapping Local Phase Feature (OLPF), is proposed, which is robust to pose and blurring without adversely affecting discrimination performance. To describe the variations of faces, Adaptive Gaussian Mixture Model (AGMM) is proposed which can describe the distributions of the face images. Since AGMM does not need the topology of face, the proposed method is resistant to the face detection errors caused by wrong or no alignment. Experimental results demonstrate the robustness of our method on public dataset as well as real data from surveillance camera.

Qiang Liu, King Ngi Ngan

Classifying Plant Leaves from Their Margins Using Dynamic Time Warping

Most plant species have unique leaves which differ from each other by characteristics such as the shape, colour, texture and the margin. Details of the leaf margin are an important feature in comparative plant biology, although they have largely overlooked in automated methods of classification. This paper presents a new method for classifying plants according to species, using only the leaf margins. This is achieved by utilizing the dynamic time warping (DTW) algorithm. A margin signature is extracted and the leaf’s insertion point and apex are located. Using these as start points, the signatures are then compared using a version of the DTW algorithm. A classification accuracy of over 90% is attained on a dataset of 100 different species.

James S. Cope, Paolo Remagnino

Utilizing the Hungarian Algorithm for Improved Classification of High-Dimension Probability Density Functions in an Image Recognition Problem

A method is presented for the classification of images described using high-dimensional probability density functions (pdfs). A pdf is described by a set of


points sampled from its distribution. These points represent feature vectors calculated from windows sampled from an image. A mapping is found, using the Hungarian algorithm, between the set of points describing a class, and the set for a pdf to be classified, such that the distance that points must be moved to change one set into the other is minimized. The method uses these mappings to create a classifier that can model the variation within each class. The method is applied to the problem of classifying plants based on images of their leaves, and is found to outperform several existing methods.

James S. Cope, Paolo Remagnino

Classification of Hyperspectral Data over Urban Areas Based on Extended Morphological Profile with Partial Reconstruction

Extended morphological profiles with reconstruction are widely used in the classification of very high resolution hyperspectral data from urban areas. However, morphological profiles constructed by morphological openings and closings with reconstruction can lead to some undesirable effects. Objects expected to disappear at a certain scale remain present when using morphological openings and closings by reconstruction. In this paper, we apply extended morphological profiles with partial reconstruction (EMPP) to the classification of high resolution hyperspectral images from urban areas. We first used feature extraction to reduce the dimensionality of the hyperspectral data, as well as reduce the redundancy within the bands, then constructed EMPP on features extracted by PCA, independent component analysis and kernel PCA for the classification of high resolution hyperspectral images from urban areas. Experimental results on real urban hyperspectral image demonstrate that the proposed EMPP built on kernel principal components gets the best results, particularly in the case with small training sample sizes.

Wenzhi Liao, Rik Bellens, Aleksandra Pižurica, Wilfried Philips, Youguo Pi

Saliency Filtering of SIFT Detectors: Application to CBIR

The recognition of object categories is one of the most challenging problems in computer vision field.It is still an open problem , especially in content based image retrieval (CBIR).When using analysis algorithm, a trade-off must be found between the quality of the results expected, and the amount of computer resources allocated to manage huge amount of generated data. In human, the mechanisms of evolution have generated the visual attention system which selects the most important information in order to reduce both cognitive load and scene understanding ambiguity. In computer science, most powerful algorithms use local approaches as bag-of-features or sparse local features. In this article, we propose to evaluate the integration of one of the most recent visual attention model in one of the most efficient CBIR method. First, we present these two algorithms and the database used to test results. Then, we present our approach which consists in pruning interest points in order to select a certain percentage of them (40% to 10% ). This filtering is guided by a saliency map provided by a visual attention system. Finally, we present our results which clearly demonstrate that interest points used in classical CBIR methods can be drastically pruned without seriously impacting results. We also demonstrate that we have to smartly filter learning and training data set to obtain such results.

Dounia Awad, Vincent Courboulay, Arnaud Revel

Geometry and Shape

Detection of Near-Duplicate Patches in Random Images Using Keypoint-Based Features

Detection of similar fragments in unknown images is typically based on the


paradigm. After the keypoint correspondences are found, the configuration constraints are used to identify clusters of similar and similarly transformed keypoints. This method is computationally expensive and hardly applicable to large databases. As an alternative, we propose novel affine-invariant TERM features characterizing geometry of groups of elliptical keyregions so that similar patches can be found by feature matching only. The paper overviews TERM features and reports experimental results confirming their high performances in image matching. A method combining visual words based on TERM descriptors with SIFT words is particularly recommended. Because of its low complexity, the proposed method can be prospectively used with visual databases of large sizes.

Andrzej Śluzek, Mariusz Paradowski

The Mean Boundary Curve of Anatomical Objects

In this paper, we develop an algorithm to compute the mean shape of a collection of planar curves for the computation of the mean shape of a collection of organs. We first define the relative distortion of a pair of curves using curvatures of curves. Then, we derive the mean of curves as the curve which minimises the total distortion of a collection of shapes.

Keiko Morita, Atsushi Imiya, Tomoya Sakai, Hidekata Hontan, Yoshitaka Masutani

3D Parallel Thinning Algorithms Based on Isthmuses

Thinning is a widely used technique to obtain skeleton-like shape features (i.e., centerlines and medial surfaces) from digital binary objects. Conventional thinning algorithms preserve endpoints to provide important geometric information relative to the object to be represented. An alternative strategy is also proposed that preserves isthmuses (i.e., generalization of curve/surface interior points). In this paper we present ten 3D parallel isthmus-based thinning algorithm variants that are derived from some sufficient conditions for topology preserving reductions.

Gábor Németh, Kálmán Palágyi

Approximate Regularization for Structural Optical Flow Estimation

We address the problem of maximum a posteriori (MAP) estimation of optical flow with a geometric prior from gray-value images. We estimate simultaneously the optical flow and the corresponding surface – the structural optical flow (SOF) – subject to three types of constraints: intensity constancy, geometric, and smoothness constraints. Our smoothness constraints restrict the unknowns to locally coincide with a set of finitely parameterized admissible functions. The geometric constraints locally enforce consistency between the optical flow and the corresponding surface. Our theory amounts to a discrete generalization of regularization defined in terms of partial derivatives. The point-wise regularizers are efficiently implemented with linear run-time complexity in the number of discretization points. We demonstrate the applicability of our method by example computations of SOF from photographs of human faces.

Aless Lasaruk

Semi-variational Registration of Range Images by Non-rigid Deformations

We present a semi-variational approach for accurate registration of a set of range images. For each range image we estimate a transformation composed of a similarity and a free-form deformation in order to obtain a smoothly stitched surface. The resulting three-dimensional model has no jumps or sharp transitions in the place of stitching. We use the presented approach for accurate human head reconstruction from a set of facets subsequently captured from different views and computed independently. A joint energy for both types of transformations is formulated, which involves several regularization constraints defined according to a specification of the resulting surface. A strategy for reweighting the impact of correspondences is presented to improve stability and convergence of the approach. We demonstrate the applicability of our method on several representative examples.

Denis Lamovsky

Active Visual-Based Detection and Tracking of Moving Objects from Clustering and Classification Methods

This paper describes a method proposed for the detection, the tracking and the identification of mobile objects, detected from a mobile camera, typically a camera embedded on a robot. A global architecture is presented, using only vision, in order to solve simultaneously several problems: the camera (or vehicle) Localization, the environment Mapping and the Detection and Tracking of Moving Objects. The goal is to build a convenient description of a dynamic scene from vision: what is static? What is dynamic? where is the robot? how do other mobile objects move? It is proposed to combine two approaches; first a Clustering method allows to detect static points, to be used by the SLAM algorithm and dynamic ones, to segment and estimate the status of mobile objects. Second a classification approach allows to identify objects of known classes in image regions. These two approaches are combined in an active method based in a Motion Grid in order to select actively where to look for mobile objects. The overall approach is evaluated with real data acquired indoor and outdoor from a camera embedded on a mobile robot.

David Márquez-Gámez, Michel Devy

Recovering Projective Transformations between Binary Shapes

Binary image registration has been addressed by many authors recently however most of the proposed approaches are restricted to affine transformations. In this paper a novel approach is proposed to estimate the parameters of a general projective transformation (also called homography) that aligns two shapes. Recovering such projective transformations is a fundamental problem in computer vision with various applications. While classical approaches rely on established point correspondences the proposed solution does not need any feature extraction, it works only with the coordinates of the foreground pixels. The two-step method first estimates the perspective distortion independently of the affine part of the transformation which is recovered in the second step. As experiments on synthetic as well on real images show that the proposed method less sensitive to the strength of the deformation than other solutions. The efficiency of the method has also been demonstrated on the traffic sign matching problem.

József Németh

Detection, Recognition and Retrieval

Hand Posture Classification by Means of a New Contour Signature

This paper deals with hand posture recognition. Thanks to an adequate setup, we afford a database of hand photographs. We propose a novel contour signature, obtained by transforming the image content into several signals. The proposed signature is invariant to translation, rotation, and scaling. It can be used for posture classification purposes. We generate this signature out of photographs of hands: experiments show that the proposed signature provides good recognition results, compared to Hu moments and Fourier descriptors.

Nabil Boughnim, Julien Marot, Caroline Fossati, Salah Bourennane

Kernel Similarity Based AAMs for Face Recognition

Illumination and facial pose conditions have an explicit effect on the performance of face recognition systems, caused by the complicated non-linear variation between feature points and views. In this paper, we present a Kernel similarity based Active Appearance Models (KSAAMs) in which we use a Kernel Method to replace Principal Component Analysis (PCA) which is used for feature extraction in Active Appearance Models. The major advantage of the proposed approach lies in a more efficient search of non-linear varied parameter under complex face illumination and pose variation conditions. As a consequence, images illuminated from different directions, and images with variable poses can easily be synthesized by changing the parameters found by KSAAMs. From the experimental results, the proposed method provides higher accuracy than classical Active Appearance Model for face alignment in a point-to-point error sense.

Yuyao Zhang, Younes Benhamza, Khalid Idrissi, Christophe Garcia

Real-Time Dance Pattern Recognition Invariant to Anthropometric and Temporal Differences

We present a cascaded real-time system that recognizes dance patterns from 3D motion capture data. In a first step, the body trajectory, relative to the motion capture sensor, is matched. In a second step, an angular representation of the skeleton is proposed to make the system invariant to anthropometric differences relative to the body trajectory. Coping with non-uniform speed variations and amplitude discrepancies between dance patterns is achieved via a sequence similarity measure based on Dynamic Time Warping (DTW). A similarity threshold for recognition is automatically determined. Using only one good motion exemplar (baseline) per dance pattern, the recognition system is able to find a matching candidate pattern in a continuous stream of data, without prior segmentation. Experiments show the proposed algorithm reaches a good trade-off between simplicity, speed and recognition rate. An average recognition rate of 86.8% is obtained in real-time.

Meshia Cédric Oveneke, Valentin Enescu, Hichem Sahli

Entropy Based Supervised Merging for Visual Categorization

Bag Of visual Words (BoW) is widely regarded as the standard representation of visual information present in the images and is broadly used for retrieval and concept detection in videos. The generation of visual vocabulary in the BoW framework generally includes a quantization step to cluster the image features into a limited number of visual words. This quantization achieved through unsupervised clustering does not take any advantage of the relationship between the features coming from images belonging to similar concept(s), thus enlarging the semantic gap. We present a new dictionary construction technique to improve the BoW representation by increasing its discriminative power. Our solution is based on a two step quantization: we start with k-means clustering followed by a bottom-up supervised clustering using features’ label information. Results on the TRECVID 2007 data [8] show improvements with the proposed construction of the BoW.

We equally give upperbounds of improvement over the baseline for the retrieval rate of each concept using the best supervised merging criteria.

Usman Farrokh Niaz, Bernard Merialdo

Selective Color Image Retrieval Based on the Gaussian Mixture Model

In this paper a novel technique of color based image retrieval is proposed. The image is represented by Gaussian mixtures of the set of histograms corresponding to the spatial location of the color regions within the image. The proposed approach enables to express user’s needs concerning the specified color arrangements of the retrieved images, in form of the colors belonging to the eleven basic color groups along with their spatial locations. The solution proposed in this paper utilizes the mixture modeling of the information of each set of the color channels. Experimental results show that the proposed method is efficient and flexible, when specific user’s requirements are considered.

Maria Luszczkiewicz-Piatek, Bogdan Smolka

Water Region Detection Supporting Ship Identification in Port Surveillance

In this paper, we present a robust and accurate water region detection technique developed for supporting ship identification. Due to the varying appearance of water body and frequent intrusion of ships, a region-based recognition is proposed. We segment the image into perceptually meaningful segments and find all water segments using a sampling-based Support Vector Machine (SVM). The algorithm is tested on 6 different port surveillance sequences and achieves a pixel classification recall of 97.5% and precision of 96.4%. We also apply our water region detection to support the task of multiple ship detection. Combined with our cabin detector, it successfully removes 74.6% false detections generated in the cabin detection process. A slight decrease of 5% in the recall value is compensated by a significant improvement of 15% in precision.

Xinfeng Bao, Svitlana Zinger, Rob Wijnhoven, Peter H. N. de With

Hand Posture Recognition with Multiview Descriptors

Preservation of asepsis in operating rooms is essential for limiting the contamination of patients by hospital-acquired infections. Strict rules hinder surgeons from interacting directly with any sterile equipement, requiring the intermediary of an assistant or a nurse. Such indirect control may prove itself clumsy and slow up the performed surgery. Gesture-based Human-Computer Interfaces show a promising alternative to assistants and could help surgeons in taking direct control over sterile equipements in the future without jeopardizing asepsis.

This paper presents the experiments we led on hand posture feature selection and the obtained results. State-of-the-art description methods classified in four different categories (i.e. local, semi-local, global and geometric description approaches) have been selected to this end. Their recognition rates when combined with a linear Support Vector Machine classifier are compared while attempting to recognize hand postures issued from an


database. For each descriptor, we study the effects of removing the background to simulate a segmentation step and the importance of a correct hand framing in the picture. Obtained results show all descriptors benefit to various extents from the segmentation step. Geometric approaches perform best, followed closely by Dalal et al.’s Histogram of Oriented Gradients.

Jean-François Collumeau, Hélène Laurent, Bruno Emile, Rémy Leconge

Object Tracking and Identification

State-Driven Particle Filter for Multi-person Tracking

Multi-person tracking can be exploited in applications such as driver assistance, surveillance, multimedia and human-robot interaction. With the help of human detectors, particle filters offer a robust method able to filter noisy detections and provide temporal coherence. However, some traditional problems such as occlusions with other targets or the scene, temporal drifting or even the lost targets detection are rarely considered, making the systems performance decrease. Some authors propose to overcome these problems using heuristics not explained and formalized in the papers, for instance by defining exceptions to the model updating depending on tracks overlapping. In this paper we propose to formalize these events by the use of a state-graph, defining the current state of the track (e.g.,

potential, tracked, occluded or lost

) and the transitions between states in an explicit way. This approach has the advantage of linking track actions such as the online underlying models updating, which gives flexibility to the system. It provides an explicit representation to adapt the multiple parallel trackers depending on the context, i.e., each track can make use of a specific filtering strategy, dynamic model, number of particles, etc. depending on its state. We implement this technique in a single-camera multi-person tracker and test it in public video sequences.

David Gerónimo Gomez, Frédéric Lerasle, Antonio M. López Peña

Particle Swarm Optimization with Soft Search Space Partitioning for Video-Based Markerless Pose Tracking

This paper proposes a new algorithm called soft partitioning particle swarm optimization (SPPSO), which performs video-based markerless human pose tracking by optimizing a fitness function in a 31-dimensional search space. The fitness function is based on foreground segmentation and edges. SPPSO divides the optimization into two stages that exploit the hierarchical structure of the model. The first stage only optimizes the most important parameters, whereas the second is a global optimization which also refines the estimates from the first stage. Experiments with the publicly available Lee walk dataset showed that SPPSO performs better than the annealed particle filter at a frame rate of 20 fps, and equally well at 60 fps. The better performance at the lower frame rate is attributed to the explicit exploitation of the hierarchical model structure.

Patrick Fleischmann, Ivar Austvoll, Bogdan Kwolek

Estimation and Prediction of the Vehicle’s Motion Based on Visual Odometry and Kalman Filter

The movement of the vehicle is an useful information for different applications, such as driver assistant systems or autonomous vehicles. This information can be known by different methods, for instance, by using a GPS or by means of the visual odometry. However, there are some situations where both methods do not work correctly. For example, there are areas in urban environments where the signal of the GPS is not available, as tunnels or streets with high buildings. On the other hand, the algorithms of computer vision are affected by outdoor environments, and the main source of difficulties is the variation in the ligthing conditions. A method to estimate and predict the movement of the vehicle based on visual odometry and Kalman filter is explained in this paper. The Kalman filter allows both filtering and prediction of vehicle motion, using the results from the visual odometry estimation.

Basam Musleh, David Martin, Arturo de la Escalera, Domingo Miguel Guinea, Maria Carmen Garcia-Alegre

Detection of HF First-Order Sea Clutter and Its Splitting Peaks with Image Feature: Results in Strong Current Shear Environment

Strong current shear environment always results in the twisty and splitted sea clutter along the range dimension in the range-Doppler spectral map. A sea clutter detection method with image feature is proposed. With 2-D image features in range-Doppler spectrum, the trend of first-order sea echoes is extracted as indicative information by a multi-scale filter. Detection rules for both single and splitting first-order sea echoes are given based on the characteristic knowledge combining the indicative information with the global characteristics such as amplitude, symmetry, continuity, etc. Compared with the classical algorithms, the proposed method can detect and locate the first-order sea echo in the HF band more accurately especially in the environment with targets/clutters smearing. Experiments with real data in strong current shear environment verify the validity of the algorithm.

Yang Li, Zhenyuan Ji, Junhao Xie, Wenyan Tang

Object Recognition Using Radon Transform-Based RST Parameter Estimation

In this paper, we propose a practical parameter recovering approach, for similarity geometric transformations using only the Radon transform and its extended version on [0,2


]. The derived objective function is exploited as a similarity measure to perform an object recognition system. Comparison results with common and powerful shape descriptors testify the effectiveness of the proposed method in recognizing binary images, RST transformed, distorted, occluded or noised.

Nafaa Nacereddine, Salvatore Tabbone, Djemel Ziou

Multi-view Gait Fusion for Large Scale Human Identification in Surveillance Videos

In this paper we propose a novel multi-view feature fusion of gait biometric information in surveillance videos for large scale human identification. The experimental evaluation on low resolution surveillance video images from a publicly available database [1] showed that the combined LDA-MLP technique turns out to be a powerful method for capturing identity specific information from walking gait patterns. The multi-view fusion at feature level allows complementarity of multiple camera views in surveillance scenarios to be exploited for improvement of identity recognition performance.

Emdad Hossain, Girija Chetty


Weitere Informationen

Premium Partner

Neuer Inhalt

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.



Best Practices für die Mitarbeiter-Partizipation in der Produktentwicklung

Unternehmen haben das Innovationspotenzial der eigenen Mitarbeiter auch außerhalb der F&E-Abteilung erkannt. Viele Initiativen zur Partizipation scheitern in der Praxis jedoch häufig. Lesen Sie hier  - basierend auf einer qualitativ-explorativen Expertenstudie - mehr über die wesentlichen Problemfelder der mitarbeiterzentrierten Produktentwicklung und profitieren Sie von konkreten Handlungsempfehlungen aus der Praxis.
Jetzt gratis downloaden!