Skip to main content

2013 | Buch

Image Analysis and Processing – ICIAP 2013

17th International Conference, Naples, Italy, September 9-13, 2013, Proceedings, Part II

insite
SUCHEN

Über dieses Buch

This two volume set (LNCS 8156 and 8157) constitutes the refereed proceedings of the 17th International Conference on Image Analysis and Processing, ICIAP 2013, held in Naples, Italy, in September 2013. The 162 papers presented were carefully reviewed and selected from 354 submissions. The papers aim at highlighting the connection and synergies of image processing and analysis with pattern recognition and machine learning, human computer systems, biomedical imaging and applications, multimedia interaction and processing, 3D computer vision, and understanding objects and scene.

Inhaltsverzeichnis

Frontmatter
Improving Gait Biometrics under Spoofing Attacks

Gait is a relatively new biometric modality which has a precious advantage over other modalities, such as iris and voice, in that it can be easily captured from a distance. While it has recently become a topic of great interest in biometric research, there has been little investigation into gait spoofing attacks where a person tries to imitate the clothing or walking style of someone else. We recently analysed for the first time the effects of spoofing attacks on silhouette based gait biometric systems and showed that it was indeed possible to spoof gait biometric systems by clothing impersonation and the deliberate selection of a target that has a similar build to the attacker. These findings are exploited in this current work for developing new solutions to cope with such possible spoofing attacks. We describe then in this paper an initial solution coping with gait spoofing attacks using part-based gait analysis. The proposed solution is thoroughly evaluated on the challenging USOU gait spoofing database collected within the EU Tabula Rasa project. The database consists of records of 22 subjects (14 male and 8 female), between 20-55 years old, walking through the Southampton tunnel in both their normal clothes and whilst wearing a common uniform. The obtained results are very promising and point out very interesting findings which can be used as a reference for developing more enhanced countermeasures by the research community.

Abdenour Hadid, Mohammad Ghahramani, John Bustard, Mark Nixon
Extracting Compact Information from Image Benchmarking Tools: The SAR Despeckling Case

Image databases and benchmarks are precious tools to assess the quality of competing algorithms and to fine tune their parameters. In some cases, however,

quality

cannot be captured by a single measure, and several of them, providing typically contrasting indications, must be computed and analyzed. This is certainly the case for the SAR despeckling field, also because of the lack of clean reference images, which forces one to compute the measures of interest on simple canonical scenes. We present here the first results of an ongoing work aimed at selecting a suitable combination of benchmark measures to assess competing SAR despeckling techniques and rank them. The full validation of the proposed methodology will require the involvement of a reasonable number of expert photo-interpreters for a large-scale experimental campaign. Here, we present only a sample experiment to provide some insight about the approach.

Gerardo Di Martino, Giovanni Pecoraro, Giovanni Poggi, Daniele Riccio, Luisa Verdoliva
Automatic Aesthetic Photo Composition

A proper aesthetic composition of photographic content does result in an actual emotional response from the watcher. In this work we propose a fully automatic computational approach to photo composition. This method takes into account well-known and widely adopted aesthetic guidelines relative to picture content as a mean for guiding an optimization framework. The resulting composition is produced as the optimal combination of cropping and retargeting.

The effectiveness of the results achieved by the method are tested and evaluated with several of experiments.

Roberto Gallea, Edoardo Ardizzone, Roberto Pirrone
Face Recognition in Uncontrolled Conditions Using Sparse Representation and Local Features

Face recognition in presence of either occlusions, illumination changes or large expression variations is still an open problem. This paper addresses this issue presenting a new local-based face recognition system that combines weak classifiers yielding a strong one. The method relies on sparse approximation using dictionaries built on a pool of local features extracted from automatically cropped images. Experiments on the AR database show the effectiveness of our method, which outperforms current state-of-the art techniques.

Alessandro Adamo, Giuliano Grossi, Raffaella Lanzarotti
Eigenvector Sign Correction for Spectral Correspondence Matching

In this paper we describe a method to correct the sign of eigenvectors of the proximity matrix for the problem of correspondence matching. The signs of the eigenvectors of a proximity matrix are not unique and play an important role in computing the correspondences between a set of feature points. We use the coefficients of the elementary symmetric polynomials to make the direction of the eigenvectors of the two proximity matrices consistent with each other for the problem correspondence matching. We compare our method to other methods presented in the literature. The empirical results show that using the coefficients of the elementary symmetric polynomials for eigenvectors sign correction is a better choice to solve the problem.

Muhammad Haseeb, Edwin R. Hancock
An Interactive Image Rectification Method Using Quadrangle Hypothesis

In this paper, we propose an interactive image rectification method for general planar objects. Our method has two interactive techniques that allow a user to choose the target region of interest. First, with a user-stroke based cropping. Second, with a box based cropping. Our method can be applied to non-rectangular objects. The idea is based on use of horizontal and vertical lines with the target object. We assume that such lines can be richly detected. Practically, at least two horizontal lines and two vertical lines must be observed. Our method has the following procedures: First, detect primitive line segments, and then select horizontal and vertical line segments using baselines. Next, make a quadrangle hypothesis as a combination of 4 line segments. And then, evaluate whether re-projected line segments will be horizontal (vertical) or not. The quadrangle hypothesis with max goodness is the final solution. In our experiments, we showed promising cropping results for several images. And we demonstrated real-time marker-less tracking using the rectified reference image.

Satoshi Yonemoto
MRF Based Image Segmentation Augmented with Domain Specific Information

A Markov Random Field based image segmentation system which combines top-down and bottom-up segmentation approaches is proposed in this study. The system is especially proposed for applications where no labeled training set is available, but some priori general information referred as

domain specific information

about the dataset is available.

Domain specific information

is received from a domain expert and formalized by a mathematical representation. The type of information and its representation depends on the content of the image dataset to be segmented. This information is integrated to the segmentation process in an unsupervised framework. Due to the inclusion of domain specific information, this approach can be considered as a first step to semantic image segmentation under an unsupervised MRF model. The proposed system is compared with the state of the art unsupervised image segmentation methods quantitatively via two evaluation metrics; consistency error and probabilistic rand index and satisfactory results are obtained.

Özge Öztimur Karadağ, Fatoş T. Yarman Vural
Segmentation of Time-Lapse Images with Focus on Microscopic Images of Cells

Phase contrast is a noninvasive microscopy imaging technique that is widely used in time-lapse imaging of cells. Resulting images however contain some optical artifacts, which makes automated processing by computer difficult.

We developed a novel algorithm for cell segmentation. It is based on processing of time differences between images and combination of thresholding, blurring and morphological operations. We tested the algorithm on four different cell types acquired by two different microscopes. We evaluated the precision of segmentation against the manual segmentation by human operator and compared also with other methods. Our algorithm is simple, fast and shows accuracy that is comparable to manual segmentation. In addition it can correctly separate the dead from living cells.

Jindřich Soukup, Petr Císař, Filip Šroubek
Segmentation with Incremental Classifiers

Radiotherapy treatment planning requires physicians to delineate the target volumes and organs at risk on 3D images of the patient. This segmentation task consumes a lot of time and can be partly automated with atlases (reference images segmented by experts). To segment any new image, the atlas is non-rigidly registered and the organ contours are then transferred. In practice, this approach suffers from the current limitations of non-rigid registration. We propose an alternative approach to extract and encode the physician’s expertise. It relies on a specific classification method that incrementally extracts information from groups of pixels in the images. The incremental nature of the process allows us to extract features that depend on partial classification results but also convey richer information. This paper is a first investigation of such an incremental scheme, illustrated with experiments on artificial images.

Guillaume Bernard, Michel Verleysen, John A. Lee
3D Femur Reconstruction Using X-Ray Stereo Pairs

In this paper, we present a 3D reconstruction method for the shape of the proximal femur using pairs of 2D radiographs. The femur shape reconstruction from a small number of images is a challenging task but it is desired as it lowers both the acquisition costs and the radiation dose compared to tomography. In this paper we investigate the reconstruction of the 3D proximal femur surface without any prior acknowledge and using a limited number of 2D images. The proposed method uses a contour points coordinates and compares three different distances to find the best matching between 2D point pairs. The impact of varying the angles between the selected images on the reconstructed 3D shape is tested. Obtained results show that it is possible to rebuild the proximal femur shape from a limited number of radiographs.

Sonia Akkoul, Adel Hafiane, Eric Lespessailles, Rachid Jennane
Information-Based Learning of Deep Architectures for Feature Extraction

Feature extraction is a crucial phase in complex computer vision systems. Mainly two different approaches have been proposed so far. A quite common solution is the design of appropriate filters and features based on image processing techniques, such as the SIFT descriptors. On the other hand, machine learning techniques can be applied, relying on their capabilities to automatically develop optimal processing schemes from a significant set of training examples. Recently, deep neural networks and convolutional neural networks have been shown to yield promising results in many computer vision tasks, such as object detection and recognition. This paper introduces a new computer vision deep architecture model for the hierarchical extraction of pixel–based features, that naturally embed scale and rotation invariances. Hence, the proposed feature extraction process combines the two mentioned approaches, by merging design criteria derived from image processing tools with a learning algorithm able to extract structured feature representations from data. In particular, the learning algorithm is based on information-theoretic principles and it is able to develop invariant features from unsupervised examples. Preliminary experimental results on image classification support this new challenging research direction, when compared with other deep architectures models.

Stefano Melacci, Marco Lippi, Marco Gori, Marco Maggini
Image Classification with Multivariate Gaussian Descriptors

Techniques based on Bag Of Words approach represent images by quantizing local descriptors and summarizing their distribution in a histogram. Differently, in this paper we describe an image as multivariate Gaussian distribution, estimated over the extracted local descriptors. The estimated distribution is mapped to a high-dimensional descriptor, by concatenating the mean vector and the projection of the covariance matrix on the Euclidean space tangent to the Riemannian manifold. To deal with large scale datasets and high dimensional feature spaces the Stochastic Gradient Descent solver is adopted. The experimental results on Caltech-101 and ImageCLEF2011 show that the method obtains competitive performance with state-of-the art approaches.

Costantino Grana, Giuseppe Serra, Marco Manfredi, Rita Cucchiara
Non–referenced Quality Assessment of Image Processing Methods in Infrared Non-destructive Testing

Infrared Non–Destructive Testing (IRNDT) uses several image processing techniques to enhance visual contrast and visibility of defects in inspected materials. The benchmarking of these techniques is often too qualitative due to a lack of quantitative criteria allowing to assess the qualities of the compared methods. In this work, we compare image processing techniques in IRNDT with a non–referenced (NR) image quality assessment (IQA) algorithm. Furthermore, we validate the NR IQA approach through a human–based quality evaluation and analyze statistical properties of IRNDT images. The results show a high correlation between NR IQA measure quality predictions and subjective evaluation. Moreover, the analysis evidenced a relationship of perceived image quality with 1) the spatial power spectral density, and 2) marginal and joint distributions of wavelet coefficients. This analysis provides a quantitative alternative when comparing image processing methods in IRNDT and can be used to develop specific IQA measure for IRNDT.

Thomas J. Ramírez-Rozo, Hernan D. Benítez-Restrepo, Julio C. García-Álvarez, German Castellanos-Domínguez
Using Dominant Sets for k-NN Prototype Selection

k

-Nearest Neighbors is surely one of the most important and widely adopted non-parametric classification methods in pattern recognition. It has evolved in several aspects in the last 50 years, and one of the most known variants consists in the usage of prototypes: a prototype distills a group of similar training points, diminishing drastically the number of comparisons needed for the classification; actually, prototypes are employed in the case the cardinality of the training data is high. In this paper, by using the dominant set clustering framework, we propose four novel strategies for the prototype generation, allowing to produce representative prototypes that mirror the underlying class structure in an expressive and effective way. Our strategy boosts the

k

-NN classification performance; considering heterogeneous metrics and analyzing 15 diverse datasets, we are among the best 6 prototype-based

k

-NN approaches, with a computational cost which is strongly inferior to all the competitors. In addition, we show that our proposal beats linear SVM in the case of a pedestrian detection scenario.

Sebastiano Vascon, Marco Cristani, Marcello Pelillo, Vittorio Murino
Feature Extraction for Iris Recognition Based on Optimized Convolution Kernels

Iris recognition has gained a lot of popularity for the last decades. Mainly a method based on binary iris templates found its way to real world use due to its simplicity, stability and reliability. The principle is that the unique iris structure is encoded to the bit code templates that are sufficient for high accuracy recognition. Encoding is performed by filtering a preprocessed iris image and storing only the phase information of the response to the filters. For years researchers used the 2D Gabor filters or their modifications, because these filters proved to provide the most reliable features. Despite the high recognition accuracy, the use of 2D Gabor filters faces a problem of spoofing. Recent studies show that the encoding process can be reverted and a spoofed iris can be obtained only based on the iris code. In this paper, we propose an efficient feature extraction method for iris recognition based on convolution kernels, learned from a database of irises. We show that the proposed method reaches state-of-the-art performance and can prohibit attackers from generating spoofed irises if the optimized convolution kernel is safely stored.

Lubos Omelina, Bart Jansen, Milos Oravec, Jan Cornelis
Saliency Based Aesthetic Cut of Digital Images

Aesthetic cut of photos is a process well known to professional photographers. It consists of cutting the original photo to remove less relevant parts close to the borders leaving in this way the interesting subjects in a position that is perceived by the observer as more pleasant. In this paper we propose a saliency based technique to automatically perform aesthetic cut in images. We use a standard method to estimate the saliency map and propose some post processing on the map to make it more suitable for our scope. We then apply a greedy algorithm to determine the cut (i.e. the most important part of the original image) both in the cases of free and fixed aspect ratio. Experimental results are reported showing how the cut resulting from our technique compares to some state of the art retargeting and cropping techniques.

Luca Greco, Marco La Cascia
A Plant Recognition Approach Using Shape and Color Features in Leaf Images

Recognizing plants is a vital problem especially for biologists, chemists, and environmentalists. Plant recognition can be performed by human experts manually but it is a time consuming and low-efficiency process. Automation of plant recognition is an important process for the fields working with plants. This paper presents an approach for plant recognition using leaf images. Shape and color features extracted from leaf images are used with k-Nearest Neighbor, Support Vector Machines, Naive Bayes, and Random Forest classification algorithms to recognize plant types. The presented approach is tested on 1897 leaf images and 32 kinds of leaves. The results demonstrated that success rate of plant recognition can be improved up to 96% with Random Forest method when both shape and color features are used.

Ali Caglayan, Oguzhan Guclu, Ahmet Burak Can
Robust Coarse-to-Fine Sparse Representation for Face Recognition

Recently Sparse Representation-based classification (SRC) has been successfully applied to pattern classification. In this paper, we present a robust Coarse-to-Fine Sparse Representation (CFSR) for face recognition. In the coarse coding phase, the test sample is represented as a linear combination of all the training samples. In the last phase, a number of “nearest neighbors” is determined to represent the test sample to perform classification. CFSR produces the sparseness through the coarse phase, and exploits the local data structure to perform classification in the fine phase. Moreover, this method can make a better classification decision by determining an individual dictionary for each test sample. Extensive experiments on benchmark face databases show that our method has competitive performance in face recognition compared with other state-of-the-art methods.

Yunlian Sun, Massimo Tistarelli
SketchSPORE: A Sketch Based Domain Separation and Recognition System for Interactive Interfaces

Multimodal interfaces are used to interact with devices and automata using different channels of communication. In this context, the sketch modality plays a key role since it allows users to convey concepts and/or commands using freehand drawing (graphical domain) and/or handwriting (textual domain). The acquisition of the sketch modality can be performed using touch (e.g., touchscreen) or touchless (e.g., RGB-D camera) tools supporting the development of versatile and powerful interactive interfaces. Domain separation and sketch recognition are two fundamental issues of these interfaces. This paper presents SketchSPORE a novel framework designed both to automatically distinguish graphical from textual elements within the same sketch and to recognize freehand drawing as well as handwriting. The recognition processes support both on-line and off-line modes, moreover their processing can be suitably stored within an XML file to provide a means to maintain the compatibility between the framework and service and/or application targets. Extensive experiments showing the effectiveness of the proposed method are reported and discussed.

Danilo Avola, Luigi Cinque, Giuseppe Placidi
Ontology-Assisted Object Detection: Towards the Automatic Learning with Internet

Automatic detection approaches depend essentially on the use of classifiers, that in turn are based on the learning of a given training set. The choice of the training data is crucial: even if this aspect is often neglected, the visual information contained in the training samples can make the difference in a detection/classification scenario. A good training set has to be sufficiently informative to capture the nature of the object under analysis, but at the same time has to be generic enough to avoid overfitting and to cope with new instances of the object of interest. In this paper we follow those approaches that pursue automatic learning from Internet data. We try to show how such training set can be made more appropriate by leveraging on semantic technologies, like lexical resources and ontologies, in the task of retrieving images from the Web through the use of a search engine. Experiments on several object classes of the CalTech101 dataset promote our idea, showing an average increment on the detection accuracy of about 8%.

Francesco Setti, Dong-Seon Cheng, Sami Abduljalil Abdulhak, Roberta Ferrario, Marco Cristani
Epithelial Cell Segmentation in Histological Images of Testicular Tissue Using Graph-Cut

Computerized image processing has provided us with valuable tools for analyzing histology images. However, histology images are complex, and the algorithm which is developed for a data set may not work for a new and unseen data set. The preparation procedure of the tissue before imaging can significantly affect the resulting image. Even for the same staining method, factors like delayed fixation may alter the image quality. In this paper we face the challenging problem of designing a method that works on data sets with strongly varying quality. In environmental research, due to the distance between the site where the wild animals are caught and the laboratory, there is always a delay in fixation. Here we suggest a segmentation method based on the structural information of epithelium cell layer in testicular tissue. The cell nuclei are detected using the fast radial symmetry filter. A graph is constructed on top of the epithelial cells. Graph-cut optimization method is used to cut the links between cells of different tubules. The algorithm is tested on five different groups of animals. Group one is fixed immediately, three groups were left at room temperature for 18, 30 and 42 hours respectively, before fixation. Group five was frozen after 6 hours in room temperature and thawed. The suggested algorithm gives promising results for the whole data set.

Azadeh Fakhrzadeh, Ellinor Spörndly-Nees, Lena Holm, Cris L. Luengo Hendriks
Urban Road Network Extraction Based on Fuzzy ART, Radon Transform and Morphological Operations on DSM Data

In urban areas, the main disadvantage of an aerial photo for road extraction is the shadow cast by buildings and the complexity of the road network. For this case, we used Digital Surface Model (DSM) data, which are based on the elevation of land surfaces. However, one of the problems associated with DSM data is the non-road area with the same road elevations, like parking places, parks, empty ground and so on. In this paper, we propose the Mixed ART clustering on histogram followed by region growing to extract the initial road and perform the road filter by opening operation with a line shape structuring element, where the line orientation is obtained from the Radon Transform. Finally, the road networks are constructed based on B-Spline curve from the skeleton of the extracted road. The experimental result shows that the proposed method improved the quality and the accuracy average within an acceptable time.

Darlis Herumurti, Keiichi Uchimura, Gou Koutaki, Takumi Uemura
A Weighted Majority Vote Strategy Using Bayesian Networks

Most of the methods for combining classifiers rely on the assumption that the experts to be combined make uncorrelated errors. Unfortunately, this theoretical assumption is not easy to satisfy in practical cases, thus effecting the performance obtainable by applying any combination strategy. We tried to solve this problem by explicitly modeling the dependencies among the experts through the estimation of the joint probability distributions among the outputs of the classifiers and the true class. In this paper we propose a new weighted majority vote rule, that uses the joint probabilities of each class as weights for combining classifier outputs. A Bayesian Network automatically infers the joint probability distribution for each class. The final decision is made by taking into account both the votes received by each class and the statistical behavior of the classifiers. The experimental results confirmed the effectiveness of the proposed method.

Luigi P. Cordella, Claudio De Stefano, Francesco Fontanella, Alessandra Scotto di Freca
Pedestrian Detection in Poor Visibility Conditions: Would SWIR Help?

The 2WIDE_SENSE (

WIDE spectral band & WIDE dynamics multifunctional imaging SENSor Enabling safer car transportation

) EU funded project is aimed at the development of a low-cost camera sensor for Advanced Driver Assistance Systems (ADAS) applications able to acquire the full

visible

to

Short Wave InfraRed

(SWIR) spectrum from 400 to 1700

nm

. This paper presents the first results obtained by investigating the SWIR contribution to pedestrian detection in difficult visibility conditions as haze and fog employing the wide-bandwidth camera developed within the project.

Massimo Bertozzi, Rean Isabella Fedriga, Alina Miron, Jean-Luc Reverchon
Multi-target Data Association Using Sparse Reconstruction

In this paper we describe a solution to multi-target data association problem based on ℓ

1

-regularized sparse basis expansions. Assuming we have sufficient training samples per subject, our idea is to create a discriminative basis of observations that we can use to reconstruct and associate a new target. The use of ℓ

1

-regularized basis expansions allows our approach to exploit multiple instances of the target when performing data association rather than relying on an average representation of target appearance. Preliminary experimental results on the PETS dataset are encouraging and demonstrate that our approach is an accurate and efficient approach to multi-target data association.

Andrew D. Bagdanov, Alberto Del Bimbo, Dario Di Fina, Svebor Karaman, Giuseppe Lisanti, Iacopo Masi
The Recognition of Polynomial Position and Orientation through the Finite Polynomial Discrete Radon Transform

In this paper, we propose to accurately detect from an image curvilinear features that can be approximated by polynomial curves. Having the a priori knowledge of a polynomial parameters (coefficients and degree), we give the possibility to recognize both the orientation and the position of the polynomial (if it exists) in the given image. For this objective, we present a new approach titled ”The Finite Polynomial Discrete Radon Transform” (FPDRT) that maps the initial image into a Radon space where each point presents the amount of evidence of the existence of a polynomial at the same position. The FPDRT sums the pixels centered on a polynomial and stores the result at the corresponding position in the Radon space. The FPDRT extends the formalism of the Finite Discrete Radon Transform(FRT) which is restricted to project the image along straight lines of equation

y

 = 

mx

 + 

t

where

m

and

t

are integers. Our method generalizes FRT by projecting the image with respect to polynomials of equation

y

 = 

mx

n

 + 

t

where

m

,

n

and

t

are integers. The FPDRT method is exactly invertible, requires only arithmetic operations and is applicable to

p

×

p

sized images where

p

is a prime number. Several applications are allowable by the FPDRT such as fingerprint, palm print biometric applications and multi directional roads recognition.

Ines Elouedi, Régis Fournier, Amine Naït-Ali, Atef Hamouda
Multiple Classifier Systems for Image Forgery Detection

A large number of techniques have been proposed recently for forgery detection, based on widely different principles and processing tools. As a result, each technique performs well with some types of forgery, and under given hypotheses, and much worse in other situations. To improve robustness, one can merge the output of different techniques but it is not obvious how to balance the different sources of information. In this paper we consider and test several combining rules, working both at the abstract level and at measurement level, and providing information on both presence and location of suspect tampered regions. Experimental results on a suitable dataset of forged images show that a careful fusion of detector’s output largely outperforms individual detectors, and that measurement-level fusion methods are more effective than abstract-level ones.

Davide Cozzolino, Francesco Gargiulo, Carlo Sansone, Luisa Verdoliva
Using the Watershed Transform for Iris Detection

Iris biometric systems are of interest for security applications. In this respect, iris segmentation has a key role, as it must be fast and accurate. In this paper, we present a new watershed based approach for iris segmentation in color images. The watershed transform is used in two distinct phases of iris segmentation: it is first used to obtain a preliminary segmentation, which constitutes the input to a circle fitting procedure; then, it is used together with the portion of the input image resulting after circle fitting to identify more precisely the pixels actually belonging to the iris. The experimental results show that the suggested approach is effective with respect to both location accuracy and computational complexity.

Maria Frucci, Michele Nappi, Daniel Riccio, Gabriella Sanniti di Baja
Outdoor Environment Monitoring with Unmanned Aerial Vehicles

This work addresses the problem of video surveillance of outdoor environments with unmanned aerial vehicles (UAV). Specifically it proposes a two-step approach, with an initial offline stage in which a mosaic of the zone to be monitored is built from video sequences. The second step tackles with the problem of online detection of relevant differences between the acquired images and the mosaic model. A GPS-assisted approach is proposed to deal with efficiency issues in this online step. Experimental results prove that the proposed approach can be used to detect relevant changes in the specific case of road safety assurance in dangerous zones.

Claudio Piciarelli, Christian Micheloni, Niki Martinel, Marco Vernier, Gian Luca Foresti
Training Binary Descriptors for Improved Robustness and Efficiency in Real-Time Matching

Most descriptor-based keypoint recognition methods require computationally expensive patch preprocessing to obtain insensitivity to various kinds of deformations. This limits their applicability towards real-time applications on low-powered devices such as mobile phones. In this paper, we focus on descriptors which are relatively weak (i.e. sensitive to scale and rotation), and present a classification-based approach to improve their robustness and efficiency to achieve real-time matching. We demonstrate our method by applying it to BRIEF [7] resulting in comparable robustness to SIFT [4], while outperforming several state-of-the-art descriptors like SURF [6], ORB [8], and FREAK [10].

Sharat Saurabh Akhoury, Robert Laganière
Towards Semantic KinectFusion

In this paper we propose an extension to the KinectFusion approach which enables both SLAM-graph optimization, usually required on large looping routes, as well as discovery of semantic information in the form of object detection and localization. Global optimization is achieved by incorporating the notion of keyframe into a KinectFusion-style approach, thus providing the system with the ability to explore large environments and maintain a globally consistent map. Moreover, we integrate into the system our recent object detection approach based on a new Semantic Bundle Adjustment paradigm, thereby achieving joint detection, tracking and mapping. Although our current implementation is not optimized for real-time operation, the principles and ideas set forth in this paper can be considered a relevant contribution towards a Semantic KinectFusion system.

Nicola Fioraio, Gregorio Cerri, Luigi Di Stefano
Face Recognition under Ageing Effect: A Comparative Analysis

Previous studies indicate that performance of the face recognition system severely degrades under the ageing effect. Despite the rising attention to facial ageing, there exist no comparative evaluation of the existing systems under the impact of ageing. Moreover, the compound effect of ageing and other variate such as glasses, gender etc, that are known to influence the performance, remain overlooked till date. To this aim, the contribution of this work are as follows: 1) evaluation of

six

baseline facial representations, based on local features, under the

ageing effect

, and 2) analysis of the compound effect of ageing with other variates, i.e., race, gender, glasses, facial hair etc.

Zahid Akhtar, Ajita Rattani, Abdenour Hadid, Massimo Tistarelli
A Slightly Supervised Approach for Positive/Negative Classification of Fluorescence Intensity in HEp-2 Images

Indirect Immunofluorescence on HEp-2 slides is the recommended technique to detect antinuclear autoantibodies in patient serum. Such slides are read at the fluorescence microscope by experts of IIF, who classify the fluorescence intensity, recognize mitotic cells and classify the staining patterns for each well. The crucial need of accurately performed and correctly reported laboratory determinations has motivated recent research on computer-aided diagnosis tools in IIF to support the HEp-2 image classification. Such systems adopt a fully supervised classification approach and, hence, their chance of success depends on the quality of ground truth used to train the classification algorithms. Besides being expensive and time consuming, collecting a large and reliable ground truth in IIF is intrinsically hard due to the inter- and intra-observer variability. In order to overcome such limitations, this paper presents a slightly supervised approach for positive/negative fluorescence intensity classification. The classification phase consists in matching parts of interest automatically detected in the test image with a Gaussian mixture model built over few control images. The approach, whose operating configuration can be adapted to the cost of misclassifications, has been tested over a database with 914 images acquired from 304 different wells, achieving remarkable results on positive/negative screening task.

Giulio Iannello, Leonardo Onofri, Paolo Soda
Landmarks-SIFT Face Representation for Gender Classification

Existing methods for gender classification from facial images mostly rely on either shape or texture cues. This paper presents a novel face representation that combines both shape and texture information for gender classification. We propose extracting the Scale Invariant Feature Transform (SIFT) descriptors at specific facial landmarks positions, hence encoding both the face shape and local-texture information. Moreover, we propose a decision-level fusion framework combining this Landmarks-SIFT with Local Binary Patterns (LBP) descriptor extracted for the whole face image. LBP is known of being tolerant against uncontrolled image capturing conditions. Competitive correct classification rates for both controlled (97% for FERET) and uncontrolled (95% and 94% for LFW and KinFace) benchmark datasets were achieved using our proposed decision-level fusion.

Yomna Safaa El-Din, Mohamed N. Moustafa, Hani Mahdi
Discrete Morse versus Watershed Decompositions of Tessellated Manifolds

With improvements in sensor technology and simulation methods, datasets are growing in size, calling for the investigation of efficient and scalable tools for their analysis. Topological methods, able to extract essential features from data, are a prime candidate for the development of such tools. Here, we examine an approach based on discrete Morse theory and compare it to the well-known watershed approach as a means of obtaining Morse decompositions of tessellated manifolds endowed with scalar fields, such as triangulated terrains or tetrahedralized volume data. We examine the theoretical aspects as well as present empirical results based on synthetic and real-world data describing terrains and 3D scalar fields. We will show that the approach based on discrete Morse theory generates segmentations comparable to the watershed approach while being theoretically sound, more efficient with regard to time and space complexity, easily parallelizable, and allowing for the computation of all descending and ascending

i

-manifolds and the topological structure of the two Morse complexes.

Leila De Floriani, Federico Iuricich, Paola Magillo, Patricio Simari
A New Algorithm for Cortical Bone Segmentation with Its Validation and Applications to In Vivo Imaging

Cortical bone supports and protects our skeletal functions and it plays an important in determining bone strength and fracture risks. Cortical bone segmentation is needed for quantitative analyses and the task is nontrivial for

in vivo

multi-row detector CT (MD-CT) imaging due to limited resolution and partial volume effects. An automated cortical bone segmentation algorithm for

in vivo

MD-CT imaging of distal tibia is presented. It utilizes larger contextual and topologic information of the bone using a modified fuzzy distance transform and connectivity analyses. An accuracy of 95.1% in terms of volume of agreement with true segmentations and a repeat MD-CT scan intra-class correlation of 98.2% were observed in a cadaveric study. An

in vivo

study involving 45 age-similar and height-matched pairs of male and female volunteers has shown that, on an average, male subjects have 16.3% thicker cortex and 4.7% increased porosity as compared to females.

Cheng Li, Dakai Jin, Trudy L. Burns, James C. Torner, Steven M. Levy, Punam K. Saha
Automatic Lesion Detection in Breast DCE-MRI

Dynamic Contrast Enhanced-Magnetic Resonance Imaging (DCE-MRI) has demonstrated in recent years a great potential in screening of high-risk women for breast cancer, in staging newly diagnosed patients and in assessing therapy effects. The aim of this work is to propose an automated system for suspicious lesion detection in DCE-MRI to support radiologists during patient image analysis. The proposed method is based on a Support Vector Machine trained with dynamic features, extracted, after a suitable pre-processing of the image, from an area pre-selected by using a pixel-based approach. The performance were evaluated by using a leave-one-patient-out approach and compared to manual segmentation made up by an experienced radiologist. Our results were also compared to other automatic segmentation methodologies: the proposed method maximises the area of correctly detected lesions while minimizing the number of false alarms (with an accuracy of 98.70%).

Stefano Marrone, Gabriele Piantadosi, Roberta Fusco, Antonella Petrillo, Mario Sansone, Carlo Sansone
Invariants to Symmetrical Convolution with Application to Dihedral Kernel Symmetry

We derive invariants to convolution with a symmetrical kernel in an arbitrary dimension. They are expressed in the Fourier domain as a ratio of the Fourier transform and of the symmetrical projection of the Fourier transform. In 2D and for dihedral symmetries particularly, we newly express the invariants as moment forms suitable for practical calculations. We clearly demonstrate on real photographs, that all the derived invariants are irreplaceable in pattern recognition. We further demonstrate their invariance and discriminability. We expect there is potential to use these invariants also in other fields, including microscopy.

Jiří Boldyš, Jan Flusser
Observing Dynamic Urban Environment through Stereo-Vision Based Dynamic Occupancy Grid Mapping

Occupancy grid maps are popular tools of representing surrounding environments for mobile robots/ intelligent vehicles. When moving in dynamic real world, traditional occupancy grid mapping is required not only to be able to detect occupied areas, but also to be able to understand the dynamic circumstance. The paper addresses this issue by presenting a stereo-vision based framework to create dynamic occupancy grid map, for the purpose of intelligent vehicle. In the proposed framework, a sparse feature points matching and a dense stereo matching are performed in parallel for each stereo image pair. The former process is used to analyze motions of the vehicle itself and also surrounding moving objects. The latter process calculates dense disparity image, as well as U-V disparity maps applied for pixel-wise moving objects segmentation and dynamic occupancy grid mapping. Principal advantage of the proposed framework is the ability of mapping occupied areas and moving objects at the same time. Meanwhile, compared with some existing methods, the stereo-vision based occupancy grid mapping algorithm is improved. The proposed method is verified in real datasets acquired by our platform SeT-Car.

You Li, Yassine Ruichek
A Multiple Classifier Approach for Detecting Naked Human Bodies in Images

With the arise of Web 2.0 and social networks, millions of people upload multimedia contents on the web every day. Images are typically published without control and only if users report a problem, unappropriate or offensive content are removed. So, there is the need of a system for automatically processing such kind of data.

In this paper we propose a system, based on a multiple classifier approach, for detecting naked human bodies in images. It analyzes both body geometric properties and global visual features. A comparison with other state-of-the-art proposals demonstrated the effectiveness of the proposed approach.

Luca Giangiuseppe Esposito, Carlo Sansone
Diversity in Ensembles of Codebooks for Visual Concept Detection

Visual codebooks generated by the quantization of local descriptors allows building effective feature vectors for image archives. Codebooks are usually constructed by clustering a subset of image descriptors from a set of training images. In this paper we investigate the effect of the combination of an ensemble of different codebooks, each codebook being created by using different pseudo-random techniques for subsampling the set of local descriptors. Despite the claims in the literature on the gain attained by combining different codebook representations, reported results on different visual detection tasks show that the diversity is quite small, thus allowing for modest improvement in performance w.r.t. the standard random subsampling procedure, and calling for further investigation on the use of ensemble approaches in this context.

Luca Piras, Roberto Tronci, Giorgio Giacinto
A Novel Method for Fast Processing of Large Remote Sensed Image

In this paper we present a novel approach to reduce the computational load of a CFAR detector. The proposed approach is based on the use of integral images to directly manage the presence of masked pixels or invalid data and reduce the computational time. The approach goes through the challenging problem of ship detection from remote sensed data. The capability of fast image processing allows to monitor the marine traffic and identify possible threats. The approach allows to significantly boost the performance up to 50x working with very high resolution image and large kernels.

Adriano Mancini, Anna Nora Tassetti, Alessandro Cinnirella, Emanuele Frontoni, Primo Zingaretti
MATRIOSKA: A Multi-level Approach to Fast Tracking by Learning

In this paper we propose a novel framework for the detection and tracking in real-time of unknown object in a video stream. We decompose the problem into two separate modules: detection and learning. The detection module can use multiple keypoint-based methods (ORB, FREAK, BRISK, SIFT, SURF and more) inside a fallback model, to correctly localize the object frame by frame exploiting the strengths of each method. The learning module updates the object model, with a growing and pruning approach, to account for changes in its appearance and extracts negative samples to further improve the detector performance. To show the effectiveness of the proposed tracking-by-detection algorithm, we present quantitative results on a number of challenging sequences where the target object goes through changes of pose, scale and illumination.

Mario Edoardo Maresca, Alfredo Petrosino
Towards a Realistic Distribution of Cells in Synthetically Generated 3D Cell Populations

In fluorescence microscopy, the proper evaluation of image segmentation algorithms is still an open problem. In the field of cell segmentation, such evaluation can be seen as a study of the given algorithm how well it can discover individual cells as a function of the number of them in an image (size of cell population), their mutual positions (density of cell clusters), and the level of noise. Principally, there are two approaches to the evaluation. One approach requires real input images and an expert that verifies the segmentation results. This is, however, expert dependent and, namely when handling 3D data, very tedious. The second approach uses synthetic images with ground truth data to which the segmentation result is compared objectively. In this paper, we propose a new method for generating synthetic 3D images showing naturally distributed cell populations attached to microscope slide. Cell count and clustering probability are user parameters of the method.

David Svoboda, Vladimír Ulman
Single Textual Image Super-Resolution Using Multiple Learned Dictionaries Based Sparse Coding

In this paper, we propose a new approach based on sparse coding for single textual image Super-Resolution (SR). The proposed approach is able to build more representative dictionaries learned from a large training Low-Resolution/High-Resolution (LR/HR) patch pair database. In fact, an intelligent clustering is employed to partition such database into several clusters from which multiple coupled LR/HR dictionaries are constructed. Based on the assumption that patches of the same cluster live in the same subspace, we exploit for each local LR patch its similarity to clusters in order to adaptively select the appropriate learned dictionary over that such patch can be well sparsely represented. The obtained sparse representation is hence applied to generate a local HR patch from the corresponding HR dictionary. Experiments on textual images show that the proposed approach outperforms its counterparts in visual fidelity as well as in numerical measures.

Rim Walha, Fadoua Drira, Franck Lebourgeois, Christophe Garcia, Adel M. Alimi
Mixed Kernel Function SVM for Pulmonary Nodule Recognition

Automatic pulmonary nodule detection in computed tomography (CT) images has been a challenging problem in computer aided diagnosis (CAD). Most recent recognition methods based on support vector machines (SVMs) have shown difficulty in achieving balanced sensitivity and accuracy. To improve overall performance of SVM based pulmonary nodule detection, a mixed kernel SVM method is proposed for recognizing pulmonary nodules in CT images by combining both Gaussian and polynomial kernel functions. The proposed mixed kernel SVM, together with a grid search for parameters optimization, can be tuned to seek a balance between sensitivity and accuracy so as to meet the CADs need, and eventually to improve learning and generalization ability of the SVM at the same time. In our experiments, thirteen features were extracted from the candidate regions of interest (ROIs) preprocessed from a set of real CT samples, and the mixed kernel SVM was trained to recognize the nodules in the ROIs. The results show that the proposed method takes into account both the sensitivity and accuracy compared to single kernel SVMs. The sensitivity and accuracy of the proposed method achieve 92.59% and 92% respectively.

Yang Li, Dunwei Wen, Ke Wang, A’lin Hou
Fast and Accurate Tree-Based Clustering for Japanese/Chinese Character Recognition

Recognizing text in natural scene images is very important to develop various systems such as an assistant device for visually-impaired people. Multilingual scene text recognition is also becoming important for wearable camera devices with language translation feature. Since computational resources are limited on such mobile devices, fast and accurate Optical Character Recognition (OCR) algorithm is needed. Nearest Neighbor (NN) search is quite popular in feature vector-based OCR systems, and its speed improvement is required. In this paper, we develop an OCR scheme with tree-based clustering technique with LDA (Linear Discriminant Analysis) aiming at real-time Japanese/Chinese character recognition. The experimental results using ETL9B dataset show that our proposed method is 94.6% faster than our previous method, also beating other techniques, at mere 0.24% accuracy drop from the full linear search.

Yuichi Abe, Takahiro Sasaki, Hideaki Goto
Towards Automatic Hands Detection in Single Images

Detection of hands in single, unconstrained, monocular images is a very difficult task. Localization and extraction of the hand regions, however, provides important and useful knowledge that can facilitate many other tasks, such as gesture recognition, pose estimation and action recognition. In this paper we present a simple appearance-based methodology that combines face detection and anthropometric constraints to efficiently estimate the position and regions of hands in images. It requires no training neither explicit estimation of the human pose. Experimental results illustrate the performance of the approach.

Athanasios Tsitsoulis, Nikolaos Bourbakis
Precise 3D Angle Measurements in CT Wrist Images

The clinically established method to assess the displacement of a distal radius fracture is to manually measure two reference angles, the dorsal angle and the radial angle, in consecutive 2D X-ray images of the wrist. This approach has the disadvantage of being sensitive to operator errors since the measurements are performed on 2D projections of a 3D structure. In this paper, we present a semi-automatic system for measuring relative changes in the dorsal angle in 3D computed tomography (CT) images of fractured wrists. We evaluate the proposed 3D measurement method on 28 post-operative CT images of fractured wrists and compare it with the radiographic 2D measurement method used in clinical practice. The results show that our proposed 3D measurement method has a high intra- and inter-operator precision and is more precise and robust than the conventional 2D measurement method.

Johan Nysjö, Albert Christersson, Ida-Maria Sintorn, Ingela Nyström, Sune Larsson, Filip Malmberg
Layout Estimation of Highly Cluttered Indoor Scenes Using Geometric and Semantic Cues

Recovering the spatial layout of cluttered indoor scenes is a challenging problem. Current methods generate layout hypotheses from vanishing point estimates produced using 2D image features. This method fails in highly cluttered scenes in which most of the image features come from clutter instead of the room’s geometric structure. In this paper, we propose to use human detections as cues to more accurately estimate the vanishing points. Our method is built on top of the fact that people are often the focus of indoor scenes, and that the scene and the people within the scene should have consistent geometric configurations in 3D space. We contribute a new data set of highly cluttered indoor scenes containing people, on which we provide baselines and evaluate our method. This evaluation shows that our approach improves 3D interpretation of scenes.

Yu-Wei Chao, Wongun Choi, Caroline Pantofaru, Silvio Savarese
Dissimilarity Measures for the Identification of Earthquake Focal Mechanisms

This work presents a study about dissimilarity measures for seismic signals, and their relation to clustering in the particular problem of the identification of earthquake focal mechanisms, i.e. the physical phenomena which have generated an earthquake. Starting from the assumption that waveform similarity implies similarity in the focal parameters, important details about them can be determined by studying waveforms related to the wave field produced by earthquakes and recorded by a seismic network. Focal mechanisms identification is currently investigated by clustering of seismic events, using mainly

cross-correlation

dissimilarity in conjunction with hierarchical clustering algorithm. By the way, it results that such adoptions have not been sufficiently validated. To shed light on this we have studied the cross correlation dissimilarity on simulated seismic signals in conjunction with hierarchical and partitional clustering algorithms, and compared its performance with a newly one recently introduced for the purpose called

cumulative shape

. In particular, we have properly created synthetic waveforms related to two types of focal mechanisms, showing that the cumulative shape perform better than cross-correlation in the identification of the expected clustering solution.

Francesco Benvegna, Giosué Lo Bosco, Domenico Tegolo
Texture Classification Based on Co-occurrence Matrix and Neuro-Morphological Approach

This article proposes a hybrid approach for texture-based image classification using the gray-level co-occurrence matrices (GLCM), self-organizing map (SOM) methods and mathematical morphology in an unsupervised context. The GLCM is a matrix of how often different combinations of pixel brightness values (grey levels) occur in an image. The GLCM matrices extracted from an image are processed to create the training data set for a SOM neural network. The SOM model organizes and extracts prototypes from various features obtained from the GLCM matrices. These prototypes are represented by the underlying probability density function (pdf). Under the assumption that each modal region of the underlying pdf corresponds to a one homogenous region in the texture image, the second part of the approach consists in partitioning the self-organizing map into connected modal regions by making concepts of morphological watershed transformation suitable for their detection. The classification process is then based on the so detected modal regions. We compare this approach to other texture feature extraction using fractal dimension.

Mohammed Talibi Alaoui, Abderrahmane Sbihi
A Virtually Continuous Representation of the Deep Structure of Scale-Space

The deep structure of scale-space of a signal refers to tracking the zero-crossings of differential invariants across scales. In classical approaches, feature tracking is performed by neighbor search between consecutive levels of a discrete collection of scales. Such an approach is prone to noise and tracking errors and provides just a coarse representation of the deep structure. We propose a new approach that allows us to construct a virtually continuous scale-space for scalar functions, supporting reliable tracking and a fine representation of the deep structure of their critical points. Our approach is based on a piecewise-linear approximation of the scale-space, in both space and scale dimensions. We present results on terrain data and range images.

Luigi Rocca, Enrico Puppo
Integral Spiral Image for Fast Hexagonal Image Processing

A common requirement for image processing tasks is to achieve real-time performance. One approach towards achieving this for tradition rectangular pixel-based images is to use an integral image that enables feature extraction at multiple scales in a fast and efficient manner. Alternative research has introduced the concept of hexagonal pixel-based images that closely mimic the human visual system: a real-time visual system. To enhance real time capability, we present a novel integral image for hexagonal pixel based images and associated multi-scale operator implementation that significantly accelerates the feature detection process. We demonstrate that the use of integral images enables significantly faster computation than the use of conventional spiral convolution or the use of neighbourhood address look-up tables.

Sonya Coleman, Bryan Scotney, Bryan Gardiner
Rough Set Based Homogeneous Unsharp Masking for Bias Field Correction in MRI

A major issue in magnetic resonance (MR) image analysis is to remove the intensity inhomogeneity artifact present in MR images, which generally affects the performance of an automatic image analysis technique. In this context, the paper presents a novel approach for bias field correction in MR images by incorporating the merits of rough sets in estimating intensity inhomogeneity artifacts. Here, the concept of lower approximation and boundary region of rough sets deals with vagueness and incompleteness in filter structure definition and enables the algorithm to estimate optimum or near optimum bias field. A theoretical analysis is presented to justify the use of rough sets for bias field estimation. The performance of the proposed approach, along with a comparison with other bias field correction algorithms, is demonstrated on a set of MR images for different bias fields and noise levels.

Abhirup Banerjee, Pradipta Maji
Real-Time Estimation of Planar Surfaces in Arbitrary Environments Using Microsoft Kinect Sensor

We propose an algorithm, suitable for real-time robot applications, for modeling and reconstruction of complex scenes. The environment is seen as a collection of planes and the algorithm extracts in real time their parameters from the 3D point cloud provided by the Kinect sensor. The execution speed of the procedure depends on the desired reconstruction quality and on the complexity of the surroundings. Implementation issues are discussed and experiments on a real scene are included.

Francesco Castaldo, Vincenzo Lippiello, Francesco A. N. Palmieri, Bruno Siciliano
Data Ranking and Clustering via Normalized Graph Cut Based on Asymmetric Affinity

In this paper, we present an extension of the state-of-the-art normalized graph cut method based on asymmetry of the affinity matrix. We provide algorithms for classification and clustering problems and show how our method can improve solutions for unequal and overlapped data distributions. The proposed approaches are based on the theoretical relation between classification accuracy, mutual information and normalized graph cut. The first method requires a priori known class labeled data that can be utilized, e.g., for a calibration phase of a brain-computer interface (BCI). The second one is a hierarchical clustering method that does not involve any prior information on the dataset.

Olexiy Kyrgyzov, Isabelle Bloch, Yuan Yang, Joe Wiart, Antoine Souloumiac
A Boosting-Based Approach to Refine the Segmentation of Masses in Mammography

In this paper we present an algorithm for finding an accurate estimate of the contour of masses in mammograms. We assume that a rough estimate of the region containing the mass is known: in particular it is available the location of an area inside the mass (core) and a closed curve beyond which the mass does not extend. The proposed method employs a boosting-based classifier trained on the core and on a background region beyond the external contour, so that it provides an accurate estimate of the mass contour by classifying unlabeled pixels between the core and the external contour. The proposed approach is useful not only for automatic localization of mass contour, but also as a powerful tool during annotation of mammograms, given that an user provides interactively an estimate for the core and the external contour of the mass. The approach has been verified on a set of mammograms showing very encouraging results.

Mario Molinara, Claudio Marrocco, Francesco Tortorella
Visual Concept Detection and Annotation via Multiple Kernel Learning of Multiple Models

This paper presents a multi-model framework for Visual Concept Detection and Annotation(VCDA) task based on Multiple Kernel Learning(MKL), To extract discriminative visual features and build visual kernels. Meanwhile the tags associated with images are used to build the textual kernels. Finally, in order to benefit from both visual models and textual models, fusion is carried out by MKL efficiently embed. Traditionally the term frequencies model is used to capture this useful textual information. However, the shortcoming in the term frequencies model lies in the fact that the performance seriously depends on the dictionary construction and in the fact that the valuable semantic information can not be captured. To solve this problem, we propose one textual feature construction approach based on

WordNet

distance. The advantages of this approach are three-fold: (1) It is robust, because our feature construction approach does not depend on dictionary construction. (2) It can capture tags semantic information which is hardly described by the term frequencies model. (3) It efficiently fuses visual models and textual models. The experimental results on the ImageCLEF 2011 show that our approach effectively improves the recognition accuracy.

Yu Zhang, Stephane Bres, Liming Chen
Facial Expression Recognition Based on Perceived Facial Images and Local Feature Matching

Facial expression recognition is to determine the emotional state of the face regardless of its identity. Deriving an effective facial representation from original face images is a vital step for successful facial expression recognition. This paper presents a biological vision-based facial description, called Perceived Facial Images “PFI” applied to facial expression recognition. For the classification step, Scale Invariant Feature Transform “SIFT” is used to extract a local feature in images. Then, a matching computation is processed between a testing image and all train images for recognizing facial expression. To evaluate, the proposed approach is tested on the GEMEP FERA 2011 database and the Cohn-Kanade Facial Expression database. To compare, the developed algorithm achieves better experimental results than the other approaches in the literature.

Hayet Boughrara, Liming Chen, Chokri Ben Amar, Mohamed Chtourou
Real-Time 2DHoG-2DPCA Algorithm for Hand Gesture Recognition

Hand gesture recognition is one of the most challenging topics in computer vision. In this paper, a new hand gesture recognition algorithm presenting a 2D representation of histogram of oriented gradients is proposed, where each bin represents a range of angles dealt with in a separate layer which allows using 2DPCA. This method maintains the spatial relation between pixels which enhances the recognition accuracy. In addition, it can be applied on either hand contour or image representing hand details. Experimental results were performed on the latest existing depth camera dataset. The comparison with reported methods confirms excellent properties of our proposed method and promotes it for real time applications.

Omnia S. ElSaadany, Moataz M. Abdelwahab
Shearlet Network-based Sparse Coding Augmented by Facial Texture Features for Face Recognition

One open challenge in face recognition (FR) is the single training sample per subject. This paper addresses this problem through a novel approach that combine Shearlet Networks (SN) and PCA called (SNPCA). Shearlet Network takes advantage of the sparse representation (SR) properties of shearlets in biometric applications, Especially, for face coding and recognition. The main contributions of this paper are (1) the combination of the multi-scale representation which capture geometric information to derive a very efficient representation of facial templates, and the use of a PCA-based approach and (2) the design of a fusion step by a refined model of belief function based on the Dempster-Shafer rule in the context of confusion matrices. This last step is helpful to improve the processing of facial texture features. We compared our algorithm (SNPCA) against SN, a wavelet network (WN) implementation and other standard algorithms. Our tests, run on several face databases including FRGC, Extended Yale B database and others, shows that this approach yields a very competitive performance compared to wavelet networks (WN), standard shearlet and PCA-based methods.

Mohamed Anouar Borgi, Demetrio Labate, Maher El’Arbi, Chokri Ben Amar
Fuzzy Analysis of Classifier Handshapes from 3D Sign Language Data

In this paper, we present a novel technique to track and recognize both classic handshapes and descriptive classifier handshapes inside 3D sign language sequences. Our approach is able to evaluate the intensity of CL-C, CL-L and the CL-G classifiers which are used to specify sizes or amounts of objects. Our method combines Minkowski similarity measures to match the shape of the hand and a fuzzy inference system (FIS) to quantify the classifier’s intensity. We implemented and tested our framework on a set of 3D sign language data. The membership functions as well as the rules of the designed FIS were optimized by 12 participants. The system generates evaluations which are very close to human perception of the iconic information conveyed by the classifier handshapes. The correlation of results generated by our system with those awarded by 12 participants is about 0.936 which can be considered as satisfactory.

Kabil Jaballah, Mohamed Jemni
Cooking Action Recognition with iVAT: An Interactive Video Annotation Tool

Within a video recipe, we are interested in locating and annotating the various ingredients, kitchenwares and relevant cooking actions. To this end we have developed the

i

VAT

interactive

video annotation tool to support manual, semi-automatic and automatic annotations obtained on the basis of the interaction of the user with various detection algorithms. The tool integrates versions of computer vision algorithms, specifically adapted to work in an interactive and incremental learning framework.

i

VAT has been developed to annotate video recipes, but it can be easily adapted and used to annotate videos from different domains as well. In this paper we present some results with respect to the task of cooking action recognition.

Simone Bianco, Gianluigi Ciocca, Paolo Napoletano, Raimondo Schettini, Roberto Margherita, Gianluca Marini, Giuseppe Pantaleo
Spatial Resolution and Distance Information for Color Quantization

A new color quantization algorithm, CQ, is presented, which includes two phases. The first phase reduces the number of colors by reducing the spatial resolution of the input image. The second phase furthermore reduces the number of colors by performing color clustering guided by distance information. Then, color mapping completes the process. The algorithm has been tested on a large number of color images with different size and color distribution, and the performance has been compared to the performance of other algorithms in the literature.

Giuliana Ramella, Gabriella Sanniti di Baja
On the Robustness of Color Texture Descriptors across Illuminants

In this paper we evaluate several extensions of Local Binary Patterns to color images. In particular, we investigate their robustness with respect to changes in the illuminant color temperature. To do so, we recovered the spectral reflectances of 1360 texture images from the Outex 13 data set. Then, we rendered the images as if they were taken under 33 different illuminants. For each combination of a training and test illuminant, we measured the classification performance of the texture features considered. The results of this extensive experimentation are reported and critically discussed.

Simone Bianco, Claudio Cusano, Paolo Napoletano, Raimondo Schettini
Semiotic-based Conceptual Modelling of Hypermedia

We address the conceptual modelling of hypermedia regarded as semiotic texts, whose meanings are conceived by a designer, transferred through the artifact and interpreted by users within their context. We outline the communication framework, with the artifact embedding the images of the designer and user. The full model is represented in terms of four interrelated modules: story, discourse, text and social-relational ontologies. The first and second one account for the narrative structures underlying the hypermedia content, which is externalized through sensorial qualities that, in turn, evoke impressions into the user. The model addresses issues that are poorly covered by the description standard MPEG-7. It can be used for analysis, evaluation, indexing of existing hypermedia as well as the design of new ones.

Elio Toppano, Vito Roberto
Modelling Visual Appearance of Handwriting

We present an experimental validation of a model of handwriting style that builds upon a neuro-computational model of motor learning and execution. We hypothesize that handwriting style emerges from the concatenation of highly automated writing movements, called invariants, that have been learned by the subject in correspondence to the most frequent sequence of characters the subject is familiar with. We also assume that the actual shape of the ink trace contains enough information to characterize the handwriting style. The experimental results on a data set containing genuine, disguised, and forged (both skilled and naive) documents show that the model is an effective tool for modeling intra-writer and inter-writers variability and provides quantitative estimation of the difference between handwriting styles that is in accordance with the difference in the visual appearance of the handwriting.

Angelo Marcelli, Antonio Parziale, Adolfo Santoro
Learning the Scene Illumination for Color-Based People Tracking in Dynamic Environment

People tracking under non-uniform illumination is challenging, as observed appearance may change as they move around in the environment. Appearance model adaptation is inconvenient over the long run as it is subject to drift, while filtering illumination information in the data through built-in invariance is sub-optimal in terms of discriminative capability. In this work, we are interested in modeling the spatial and temporal dimensions of appearance variation induced by non-uniform illumination, and to learn and adapt related parameters over time by using walking people as illumination probes. We propose a hybrid graphical model and a new message passing scheme that sequentially updates parameters of the model, so that scene illumination can be learnt online and used for robust tracking in dynamic environment.

Sinan Mutlu, Tao Hu, Oswald Lanz
Multicamera People Tracking Using a Locus-based Probabilistic Occupancy Map

We propose a novel people detection method using a Locus-based Probabilistic Occupancy Map (LPOM). Given the calibration data and the motion edges extracted from all views, the method is able to compute the probabilistic occupancy map for the targets in the scene. We integrate the algorithm into a Bayesian-based tracker and do experiments with challenging video sequences. Experimental results demonstrate the robustness and high-precision of the tracker when tracking multiple people in the presence of clutters and occlusions.

Tao Hu, Sinan Mutlu, Oswald Lanz
Construction and Application of Marine Oil Spill Gravity Vector Differences Detection Model

This paper proposes a new marine oil spill gravity vector differences detection model based on scalability or viscosity of the oil and water. The model used the median filtering, zero pixels elimination, image normalization, nonlinear transformation, and brought in the law of gravity. The research was upon two oil spill incidents which occurred on the Mediterranean Sea in 2004 and the Gulf of Mexico in 2006. Based on the MODIS remote sensing data, we executed the model to detect the two incidents and compared the results with the results of Sobel detection algorithm. The experimental results illustrated that the model introduced in this paper is superior to Sobel detection algorithm. The proposed model is powerful in oil spill detection.

Weiguang Su, Bo Ping, Fenzhen Su
A Graph-Based Method for PET Image Segmentation in Radiotherapy Planning: A Pilot Study

Target volume delineation of Positron Emission Tomography (PET) images in radiation treatment planning is challenging because of the low spatial resolution and high noise level in PET data. The aim of this work is the development of an accurate and fast method for semi-automatic segmentation of metabolic regions on PET images. For this purpose, an algorithm for the biological tumor volume delineation based on random walks on graphs has been used. Validation was first performed on phantoms containing spheres and irregular inserts of different and known volumes, then tumors from a patient with head and neck cancer were segmented to discuss the clinical applicability of this algorithm. Experimental results show that the segmentation algorithm is accurate and fast and meets the physician requirements in a radiotherapy environment.

Alessandro Stefano, Salvatore Vitabile, Giorgio Russo, Massimo Ippolito, Daniele Sardina, Maria G. Sabini, Francesca Gallivanone, Isabella Castiglioni, Maria C. Gilardi
White Paper on Industrial Applications of Computer Vision and Pattern Recognition

The paper provides a summary of the contributions to the industrial session at ICIAP2013, describing a few practical applications of Video Analysis, in the Surveillance and Security field. The session has been organized to stimulate an open discussion within the scientific community of CVPR on new emerging research areas which deserve particular attention, and may contribute to the improvement of industrial applications in the near future.

Giovanni Garibotto, Pierpaolo Murrieri, Alessandro Capra, Stefano De Muro, Ugo Petillo, Francesco Flammini, Mariana Esposito, Cocetta Pragliola, Giuseppe Di Leo, Roald Lengu, Nadia Mazzino, Alfredo Paolillo, Michele D’Urso, Raffaele Vertucci, Fabio Narducci, Stefano Ricciardi, Andrea Casanova, Gianni Fenu, Marco De Mizio, Mario Savastano, Michele Di Capua, Alessio Ferone
Empty Vehicle Detection with Video Analytics

An important issue to be addressed in transit security, in particular for driverless metro, is the assurance that a vehicle is empty before it returns to the depot. Customer specifications in recent tenders require that an automatic empty vehicle detector is provided. That improves system security since it prevents voluntary (e.g. in case of thieves or graffiti makers) or involuntary (e.g. in case of drunk or unconscious people) access of unauthorized people to the depot and possibly to other restricted areas. Without automatic systems, a manual inspection of the vehicle should be performed, requiring considerable personnel effort and being prone to failure. To address the issue, we have developed a reliable empty vehicle detection system using video content analytics techniques and standard on-board cameras. The system can automatically check whether the vehicles have been cleared from passengers, thus supporting the security staff and central control operators in providing a higher level of security.

Francesco Buemi, Mariana Esposito, Francesco Flammini, Nicola Mazzocca, Concetta Pragliola, Marcella Spirito
Stock Control through Video Surveillance in Logistics

The transport sector has certainly witnessed the latest developments in information and communication technologies (ICT). In this context the objective of the CPILOS project is to develop a technological platform, based on IT infrastructures and services, that can support critical processes, like secure tracking and tracing of goods, using video surveillance facilities with radio-frequency identification (RFID) support. More specifically, the project involves the study and implementation of an integrated platform for quality control of goods, particularly perishable food coming from China and intended to be distributed in the Italian and European consumer markets (usually not controlled at the source).

Mariarosaria Carullo, Gianluca Cavaliere, Aniello De Prisco, Michele Di Capua, Alfredo Petrosino, Donatella Padovano, Gennaro Nave, Daniele Ruggeri
H.264 Sensor Aided Video Encoder for UAV BLOS Missions

This paper presents a new low-complexity H.264 encoder, based on x264 implementation, for Unmanned Aerial Vehicles (UAV) applications. The encoder employs a new motion estimation scheme which make use of the global motion information provided by the onboard navigation system. The results are relevant in low frame rate video coding, which is a typical scenario in UAV behind line-of-sight (BLOS) missions.

Cesario Vincenzo Angelino, Luca Cicala, Marco De Mizio, Paolo Leoncini, E. Baccaglini, M. Gavelli, N. Raimondo, R. Scopigno
Pattern Recognition for Defect Detection in Uncontrolled Environment Railway Applications

The rise in prominence of safety and maintenance cost saving related issues in railway systems is becoming more and more an important driver in the design and deployment of sophisticated Wayside Train Monitoring Systems (WTMS). In the last 20 years computer vision based WTMS have evolved from simple Hot Axle Bearing Detectors (HABD) to sophisticated Video Monitoring Systems (VMS).

Giuseppe Di Leo, Roald Lengu, Nadia Mazzino, Alfredo Paolillo
Erratum: Epithelial Cell Segmentation in Histological Images of Testicular Tissue Using Graph-Cut

The name of Abdolrahim Kadkhodamohammadi was accidentally omitted from the list of authors of the paper “Epithelial Cell Segmentation in Histological Images of Testicular Tissue Using Graph-Cut”, starting on page 201 of this volume. The author list should read as follows: Azadeh Fakhrzadeh, Ellinor Spörndly-Nees, Abdolrahim Kadkhodamohammadi, Lena Holm, and Cris L. Luengo Hendriks.

Azadeh Fakhrzadeh, Ellinor Spörndly-Nees, Abdolrahim Kadkhodamohammadi, Lena Holm, Cris L. Luengo Hendriks
Backmatter
Metadaten
Titel
Image Analysis and Processing – ICIAP 2013
herausgegeben von
Alfredo Petrosino
Copyright-Jahr
2013
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-41184-7
Print ISBN
978-3-642-41183-0
DOI
https://doi.org/10.1007/978-3-642-41184-7

Premium Partner