Skip to main content
main-content

Über dieses Buch

This volume constitutes the refereed proceedings of the 5th Iberian Conference on Pattern Recognition and Image Analysis, IbPRIA 2011, held in Las Palmas de Gran Canaria, Spain, in June 2011. The 34 revised full papers and 58 revised poster papers presented were carefully reviewed and selected from 158 submissions. The papers are organized in topical sections on computer vision; image processing and analysis; medical applications; and pattern recognition.

Inhaltsverzeichnis

Frontmatter

Oral Sessions

Computer Vision

Deforming the Blurred Shape Model for Shape Description and Recognition

This paper presents a new model for the description and recognition of distorted shapes, where the image is represented by a pixel density distribution based on the Blurred Shape Model combined with a non-linear image deformation model. This leads to an adaptive structure able to capture elastic deformations in shapes. This method has been evaluated using thee different datasets where deformations are present, showing the robustness and good performance of the new model. Moreover, we show that incorporating deformation and flexibility, the new model outperforms the BSM approach when classifying shapes with high variability of appearance.

Jon Almazán, Ernest Valveny, Alicia Fornés

A Shortest Path Approach for Vibrating Line Detection and Tracking

This paper describes an approach based on the shortest path method for the detection and tracking of vibrating lines. The detection and tracking of vibrating structures, such as lines and cables, is of great importance in areas such as civil engineering, but the specificities of these scenarios make it a hard problem to tackle. We propose a two-step approach consisting of line detection and subsequent tracking. The automatic detection of the lines avoids manual initialization - a typical problem of these scenarios - and favors tracking. The additional information provided by the line detection enables the improvement of existing algorithms and extends their application to a larger set of scenarios.

Pedro Carvalho, Miguel Pinheiro, Jaime S. Cardoso, Luís Corte-Real

And-Or Graph Grammar for Architectural Floor Plan Representation, Learning and Recognition. A Semantic, Structural and Hierarchical Model

This paper presents a syntactic model for architectural floor plan interpretation. A stochastic image grammar over an And-Or graph is inferred to represent the hierarchical, structural and semantic relations between elements of all possible floor plans. This grammar is augmented with three different probabilistic models, learnt from a training set, to account the frequency of that relations. Then, a

Bottom-Up/Top-Down

parser with a pruning strategy has been used for floor plan recognition. For a given input, the parser generates the most probable parse graph for that document. This graph not only contains the structural and semantic relations of its elements, but also its hierarchical composition, that allows to interpret the floor plan at different levels of abstraction.

Lluís-Pere de las Heras, Gemma Sánchez

Linear Prediction Based Mixture Models for Event Detection in Video Sequences

In this paper, we propose a method for the detection of irregularities in time series, based on linear prediction. We demonstrate how we can estimate the linear predictor by solving the Yule Walker equations, and how we can combine several predictors in a simple mixture model. In several tests, we compare our model to a Gaussian mixture and a hidden Markov model approach. We successfully apply our method to event detection in a video sequence.

Dierck Matern, Alexandru Paul Condurache, Alfred Mertins

A Visual Saliency Map Based on Random Sub-window Means

In this article, we propose a simple and efficient method for computing an image saliency map, which performs well on both salient region detection and as well as eye gaze prediction tasks. A large number of distinct sub-windows with random co-ordinates and scales are generated over an image. The saliency descriptor of a pixel within a random sub-window is given by the absolute difference of its intensity value to the mean intensity of the sub-window. The final saliency value of a given pixel is obtained as the sum of all saliency descriptors corresponding to this pixel. Any given pixel can be included by one or more random sub-windows. The recall-precision performance of the proposed saliency map is comparable to other existing saliency maps for the task of salient region detection. It also achieves state-of-the-art performance for the task of eye gaze prediction in terms of receiver operating characteristics.

Tadmeri Narayan Vikram, Marko Tscherepanow, Britta Wrede

There Is More Than One Way to Get Out of a Car: Automatic Mode Finding for Action Recognition in the Wild

“Actions in the wild” is the term given to examples of human motion that are performed in natural settings, such as those harvested from movies [10] or the Internet [9]. State-of-the-art approaches in this domain are orders of magnitude lower than in more contrived settings. One of the primary reasons being the huge variability within each action class. We propose to tackle recognition in the wild by automatically breaking complex action categories into multiple modes/group, and training a separate classifier for each mode. This is achieved using RANSAC which identifies and separates the modes while rejecting outliers. We employ a novel reweighting scheme within the RANSAC procedure to iteratively reweight training examples, ensuring their inclusion in the final classification model. Our results demonstrate the validity of the approach, and for classes which exhibit multi-modality, we achieve in excess of double the performance over approaches that assume single modality.

Olusegun Oshin, Andrew Gilbert, Richard Bowden

The Fast and the Flexible: Extended Pseudo Two-Dimensional Warping for Face Recognition

In this work, we propose a novel extension of pseudo 2D image warping (P2DW) which allows for joint alignment and recognition of non-rectified face images. P2DW allows for optimal displacement inference in a simplified setting, but cannot cope with stronger deformations since it is restricted to column-to-column mapping. We propose to implement additional flexibility in P2DW by allowing deviations from column centers while preserving vertical structural dependencies between neighboring pixel coordinates. In order to speed up the recognition we employ hard spacial constraints on candidate alignment positions. Experiments on two well-known face datasets show that our algorithm significantly improves the recognition quality under difficult variability such as 3D rotation (poses), expressions and illuminations, and can reliably classify even automatically detected faces. We also show an improvement over state-of-the-art results while keeping computational complexity low.

Leonid Pishchulin, Tobias Gass, Philippe Dreuw, Hermann Ney

On Importance of Interactions and Context in Human Action Recognition

This paper is focused on the automatic recognition of human events in static images. Popular techniques use knowledge of the human pose for inferring the action, and the most recent approaches tend to combine pose information with either knowledge of the scene or of the objects with which the human interacts. Our approach makes a step forward in this direction by combining the human pose with the scene in which the human is placed, together with the spatial relationships between humans and objects. Based on standard, simple descriptors like HOG and SIFT, recognition performance is enhanced when these three types of knowledge are taken into account. Results obtained in the PASCAL 2010 Action Recognition Dataset demonstrate that our technique reaches state-of-the-art results using simple descriptors and classifiers.

Nataliya Shapovalova, Wenjuan Gong, Marco Pedersoli, Francesc Xavier Roca, Jordi Gonzàlez

Detection Performance Evaluation of Boosted Random Ferns

We present an experimental evaluation of Boosted Random Ferns in terms of the detection performance and the training data. We show that adding an iterative bootstrapping phase during the learning of the object classifier, it increases its detection rates given that additional positive and negative samples are collected (bootstrapped) for retraining the boosted classifier. After each bootstrapping iteration, the learning algorithm is concentrated on computing more discriminative and robust features (Random Ferns), since the bootstrapped samples extend the training data with more difficult images.

The resulting classifier has been validated in two different object datasets, yielding successful detections rates in spite of challenging image conditions such as lighting changes, mild occlusions and cluttered background.

Michael Villamizar, Francesc Moreno-Noguer, Juan Andrade-Cetto, Alberto Sanfeliu

Feature Selection for Gender Classification

Most existing feature selection methods focus on ranking features based on an information criterion to select the best K features. However, several authors find that the optimal feature combinations do not give the best classification performance [6],[5]. The reason for this is that although individual features may have limited relevance to a particular class, when taken in combination with other features it can be strongly relevant to the class. In this paper, we derive a new information theoretic criterion that called multidimensional interaction information (MII) to perform feature selection and apply it to gender determination. In contrast to existing feature selection methods, it is sensitive to the relations between feature combinations and can be used to seek third or even higher order dependencies between the relevant features. We apply the method to features delivered by principal geodesic analysis (PGA) and use a variational EM (VBEM) algorithm to learn a Gaussian mixture model for on the selected feature subset for gender determination. We obtain a classification accuracy as high as 95% on 2.5D facial needle-maps, demonstrating the effectiveness of our feature selection methods.

Zhihong Zhang, Edwin R. Hancock

Image Processing and Analysis

Classification of Repetitive Patterns Using Symmetry Group Prototypes

We present a novel computational framework for automatic classification method by symmetries, for periodic images applied to content based image retrieval. The existing methods have several drawbacks because of the use of heuristics. These methods have shown low classification values when images exhibit imperfections due to the fabrication or the hand made process. Also, there is no way to give some computation of the

classification goodness-of-fit

. We propose to obtain an automatic parameter estimation for symmetry analysis. Thus, the image classification is redefined as distances computation to the prototypes of a set of defined classes. Our experimental results improves the state of the art in wallpaper classification methods.

Manuel Agustí-Melchor, Angel Rodas-Jordá, Jose-Miguel Valiente-González

Distance Maps from Unthresholded Magnitudes

A straightforward algorithm that computes distance maps from unthresholded magnitude values is presented, suitable for still images and video sequences. While results on binary images are similar to classic Euclidean Distance Transforms, the proposed approach does not require a binarization step. Thus, no thresholds are needed and no information is lost in intermediate classification stages. Experiments include the evaluation of segmented images using the watershed algorithm and the measurement of pixel value stability in video sequences.

Luis Anton-Canalis, Mario Hernandez-Tejera, Elena Sanchez-Nielsen

Scratch Assay Analysis with Topology-Preserving Level Sets and Texture Measures

Scratch assays are widely used for cell motility and migration assessment in biomedical research. However, quantitative data is very often extracted manually. Here, we present a fully automated analysis pipeline for detecting scratch boundaries and measuring areas in scratch assay images based on level set techniques. In particular, non-PDE level sets are extended for topology preservation and applied to entropy data of scratch assay microscope images. Compared to other algorithms our approach, implemented in Java as ImageJ plugin based on the extension package MiToBo, relies on a minimal set of configuration parameters. Experimental evaluations show the high-quality of extracted assay data and their suitability for biomedical investigations.

Markus Glaß, Birgit Möller, Anne Zirkel, Kristin Wächter, Stefan Hüttelmaier, Stefan Posch

Level Set Segmentation with Shape and Appearance Models Using Affine Moment Descriptors

We propose a level set based variational approach that incorporates shape priors into edge-based and region-based models. The evolution of the active contour depends on local and global information. It has been implemented using an efficient narrow band technique. For each boundary pixel we calculate its dynamic according to its gray level, the neighborhood and geometric properties established by training shapes. We also propose a criterion for shape aligning based on affine transformation using an image normalization procedure. Finally, we illustrate the benefits of the our approach on the liver segmentation from CT images.

Carlos Platero, María Carmen Tobar, Javier Sanguino, José Manuel Poncela, Olga Velasco

Medical Applications

Automatic HyperParameter Estimation in fMRI

Maximum a posteriori

(MAP) in the scope of the Bayesian framework is a common criterion used in a large number of estimation and decision problems. In image reconstruction problems, typically, the image to be estimated is modeled as a Markov Random Fields (MRF) described by a Gibbs distribution. In this case, the Gibbs energy depends on a multiplicative coefficient, called

hyperparameter

, that is usually manually tuned [14] in a trial and error basis.

In this paper we propose an automatic

hyperparameter

estimation method designed in the scope of

functional Magnetic Resonance Imaging

(fMRI) to identify activated brain areas based on

Blood Oxygen Level Dependent

(BOLD) signal.

This problem is formulated as classical binary detection problem in a Bayesian framework where the estimation and inference steps are joined together. The prior terms, incorporating the a priori physiological knowledge about the

Hemodynamic Response Function

(HRF), drift and spatial correlation across the brain (using edge preserving priors), are automatically tuned with the new proposed method.

Results on real and synthetic data are presented and compared against the conventional

General Linear Model

(GLM) approach.

David Afonso, Patrícia Figueiredo, João Miguel Sanches

Automatic Branching Detection in IVUS Sequences

Atherosclerosis is a vascular pathology affecting the arterial walls, generally located in specific vessel sites, such as bifurcations. In this paper, for the first time, a fully automatic approach for the detection of bifurcations in IVUS pullback sequences is presented. The method identifies the frames and the angular sectors in which a bifurcation is visible. This goal is achieved by applying a classifier to a set of textural features extracted from each image of an IVUS pullback. A comparison between two

state-of-the-art

classifiers is performed, AdaBoost and Random Forest. A cross-validation scheme is applied in order to evaluate the performances of the approaches. The obtained results are encouraging, showing a sensitivity of 75% and an accuracy of 94% by using the AdaBoost algorithm.

Marina Alberti, Carlo Gatta, Simone Balocco, Francesco Ciompi, Oriol Pujol, Joana Silva, Xavier Carrillo, Petia Radeva

A Region Segmentation Method for Colonoscopy Images Using a Model of Polyp Appearance

This work aims at the segmentation of colonoscopy images into a minimum number of informative regions. Our method performs in a way such, if a polyp is present in the image, it will be exclusively and totally contained in a single region. This result can be used in later stages to classify regions as polyp-containing candidates. The output of the algorithm also defines which regions can be considered as non-informative. The algorithm starts with a high number of initial regions and merges them taking into account the model of polyp appearance obtained from available data. The results show that our segmentations of polyp regions are more accurate than state-of-the-art methods.

Jorge Bernal, Javier Sánchez, Fernando Vilariño

Interactive Labeling of WCE Images

A high quality labeled training set is necessary for any supervised machine learning algorithm. Labeling of the data can be a very expensive process, specially while dealing with data of high variability and complexity. A good example of such data are the videos from Wireless Capsule Endoscopy. Building a representative WCE data set means many videos to be labeled by an expert. The problem that occurs is the data diversity, in the space of the features, from different WCE studies. That means that when new data arrives it is highly probable that it will not be represented in the training set, thus getting a high probability of performing an error when applying machine learning schemes. In this paper an interactive labeling scheme that allows reducing expert effort in the labeling process is presented. It is shown that the number of human interventions can be significantly reduced. The proposed system allows the annotation of informative/non-informative frames of the WCE video with less than 100 clicks.

Michal Drozdzal, Santi Seguí, Carolina Malagelada, Fernando Azpiroz, Jordi Vitrià, Petia Radeva

Automatic and Semi-automatic Analysis of the Extension of Myocardial Infarction in an Experimental Murine Model

Rodent models of myocardial infarction (MI) have been extensively used in biomedical research towards the implementation of novel regenerative therapies. Permanent ligation of the left anterior descending (LAD) coronary artery is a commonly used method for inducing MI both in rat and mouse. Post-mortem evaluation of the heart, particularly the MI extension assessment performed on histological sections, is a critical parameter for this experimental setting. MI extension, which is defined as the percentage of the left ventricle affected by the coronary occlusion, has to be estimated by identifying the infarcted- and the normal-tissue in each section. However, because it is a manual procedure it is time-consuming, arduous and prone to bias. Herein, we introduce semi-automatic and automatic approaches to perform segmentation which is then used to obtain the infarct extension measurement. Experimental validation is performed comparing the proposed approaches with manual annotation and a total error not exceeding 8% is reported in all cases.

Tiago Esteves, Mariana Valente, Diana S. Nascimento, Perpétua Pinto-do-Ó, Pedro Quelhas

Non-rigid Multi-modal Registration of Coronary Arteries Using SIFTflow

The fusion of clinically relevant information coming from different image modalities is an important topic in medical imaging. In particular, different cardiac imaging modalities provides complementary information for the physician: Computer Tomography Angiography (CTA) provides reliable pre-operative information on arteries geometry, even in the presence of chronic total occlusions, while X-Ray Angiography (XRA) allows intra-operative high resolution projections of a specific artery. The non-rigid registration of arteries between these two modalities is a difficult task. In this paper we propose the use of SIFTflow, in registering CTA and XRA images. At the best of our knowledge, this paper proposed SIFTflow as a XRay-CTA registration method for the first time in the literature. To highlight the arteries, so to guide the registration process, the well known Vesselness method has been employed. Results confirm that, to the aim of registration, the arteries must be highlighted and background objects removed as much as possible. Moreover, the comparison with the well known Free Form Deformation technique, suggests that SIFTflow has a great potential in the registration of multi-modal medical images.

Carlo Gatta, Simone Balocco, Victoria Martin-Yuste, Ruben Leta, Petia Radeva

Diffuse Liver Disease Classification from Ultrasound Surface Characterization, Clinical and Laboratorial Data

In this work liver contour is semi-automatically segmented and quantified in order to help the identification and diagnosis of diffuse liver disease. The features extracted from the liver contour are jointly used with clinical and laboratorial data in the staging process. The classification results of a

support vector machine

, a

Bayesian

and a

k-nearest neighbor

classifier are compared. A population of 88 patients at five different stages of diffuse liver disease and a leave-one-out cross-validation strategy are used in the classification process. The best results are obtained using the

k-nearest neighbor

classifier, with an overall accuracy of 80.68%. The good performance of the proposed method shows a reliable indicator that can improve the information in the staging of diffuse liver disease.

Ricardo Ribeiro, Rui Marinho, José Velosa, Fernando Ramalho, João Miguel Sanches

Classification of Ultrasound Medical Images Using Distance Based Feature Selection and Fuzzy-SVM

This paper presents a method of classifying ultrasound medical images towards dealing with two important aspects: (i) optimal feature subset selection for representing ultrasound medical images and (ii) improvement of classification accuracy by avoiding outliers. An objective function combining the concept of between-class distance and within-class divergence among the training dataset has been proposed as the evaluation criteria of feature selection. Searching for the optimal subset of features has been performed using Multi-Objective Genetic Algorithm (MOGA). Applying the proposed criteria, a subset of Grey Level Co-occurrence Matrix (GLCM) and Grey Level Run Length Matrix (GLRLM) based statistical texture descriptors have been identified that maximizes separability among the classes of the training dataset. To avoid the impact of noisy data during classification, Fuzzy Support Vector Machine (FSVM) has been adopted that reduces the effects of outliers by taking into account the level of significance of each training sample. The proposed approach of ultrasound medical image classification has been tested using a database of 679 ultrasound ovarian images and 89.60% average classification accuracy has been achieved.

Abu Sayeed Md. Sohail, Prabir Bhattacharya, Sudhir P. Mudur, Srinivasan Krishnamurthy

Ultrasound Plaque Enhanced Activity Index for Predicting Neurological Symptoms

This paper aims at developing an ultrasound-based diagnostic measure which quantifies plaque activity, that is, the likelihood of the asymptomatic lesion to produce neurological symptoms. The method is rooted on the identification of an “active” plaque profile containing the most relevant ultrasound parameters associated with symptoms. This information is used to build an Enhanced Activity Index (EAI) which considers the conditional probabilities of each relevant feature belonging to either symptomatic or asymptomatic groups. This measure was evaluated on a longitudinal study of 112 asymptomatic plaques and shows high diagnostic power. In particular, EAI provides correct identification of all plaques that developed symptoms while giving a small number of false positives. Results suggest that EAI could have a significant impact on stroke prediction and treatment planning.

José Seabra, Luís Mendes Pedro, José Fernandes e Fernandes, João Sanches

Pattern Recognition

On the Distribution of Dissimilarity Increments

This paper proposes a statistical model for the dissimilarity changes (increments) between neighboring patterns which follow a 2-dimensional Gaussian distribution. We propose a novel clustering algorithm, using that statistical model, which automatically determines the appropriate number of clusters. We apply the algorithm to both synthetic and real data sets and compare it to a Gaussian mixture and to a previous algorithm which also used dissimilarity increments. Experimental results show that this new approach yields better results than the other two algorithms in most datasets.

Helena Aidos, Ana Fred

Unsupervised Joint Feature Discretization and Selection

In many applications, we deal with high dimensional datasets with different types of data. For instance, in text classification and information retrieval problems, we have large collections of documents. Each text is usually represented by a bag-of-words or similar representation, with a large number of features (terms). Many of these features may be irrelevant (or even detrimental) for the learning tasks. This excessive number of features carries the problem of memory usage in order to represent and deal with these collections, clearly showing the need for adequate techniques for feature representation, reduction, and selection, to both improve the classification accuracy and the memory requirements. In this paper, we propose a combined unsupervised feature discretization and feature selection technique. The experimental results on standard datasets show the efficiency of the proposed techniques as well as improvement over previous similar techniques.

Artur Ferreira, Mário Figueiredo

Probabilistic Ranking of Product Features from Customer Reviews

In this paper, we propose a methodology for obtaining a probabilistic ranking of product features from a customer review collection. Our approach mainly relies on an entailment model between opinion and feature words, and suggest that in a probabilistic opinion model of words learned from an opinion corpus, feature words must be the most probable words generated from that model (even more than opinion words themselves). In this paper, we also devise a new model for ranking corpus-based opinion words. We have evaluated our approach on a set of customer reviews of five products obtaining encouraging results.

Lisette García-Moya, Henry Anaya-Sánchez, Rafel Berlanga, María José Aramburu

Vocabulary Selection for Graph of Words Embedding

The Graph of Words Embedding consists in mapping every graph in a given dataset to a feature vector by counting unary and binary relations between node attributes of the graph. It has been shown to perform well for graphs with discrete label alphabets. In this paper we extend the methodology to graphs with

n

-dimensional continuous attributes by selecting node representatives. We propose three different discretization procedures for the attribute space and experimentally evaluate the dependence on both the selector and the number of node representatives. In the context of graph classification, the experimental results reveal that on two out of three public databases the proposed extension achieves superior performance over a standard reference system.

Jaume Gibert, Ernest Valveny, Horst Bunke

Feature Selection in Regression Tasks Using Conditional Mutual Information

This paper presents a supervised feature selection method applied to regression problems. The selection method uses a

Dissimilarity matrix

originally developed for classification problems, whose applicability is extended here to regression and built using the conditional mutual information between features with respect to a continuous relevant variable that represents the

regression function

. Applying an agglomerative hierarchical clustering technique, the algorithm selects a subset of the original set of features. The proposed technique is compared with other three methods. Experiments on four data-sets of different nature are presented to show the

importance

of the features selected from the point of view of the regression estimation error (using Support Vector Regression) considering the Root Mean Squared Error (

RMSE

).

Pedro Latorre Carmona, José M. Sotoca, Filiberto Pla, Frederick K. H. Phoa, José Bioucas Dias

Dual Layer Voting Method for Efficient Multi-label Classification

A common approach for solving multi-label classification problems using problem-transformation methods and dichotomizing classifiers is the pairwise decomposition strategy. One of the problems with this approach is the need for querying a quadratic number of binary classifiers for making a prediction that can be quite time consuming, especially in classification problems with large number of labels. To tackle this problem we propose a Dual Layer Voting Method (DLVM) for efficient pair-wise multiclass voting to the multi-label setting, which is related to the calibrated label ranking method. Five different real-world datasets (enron, tmc2007, genbase, mediamill and corel5k) were used to evaluate the performance of the DLVM. The performance of this voting method was compared with the majority voting strategy used by the calibrated label ranking method and the quick weighted voting algorithm (QWeighted) for pair-wise multi-label classification. The results from the experiments suggest that the DLVM significantly outperforms the concurrent algorithms in term of testing speed while keeping comparable or offering better prediction performance.

Gjorgji Madjarov, Dejan Gjorgjevikj, Sašo Džeroski

Passive-Aggressive for On-Line Learning in Statistical Machine Translation

New variations on the application of the passive-aggressive algorithm to statistical machine translation are developed and compared to previously existing approaches. In online adaptation, the system needs to adapt to real-world changing scenarios, where training and tuning only take place when the system is set-up for the first time. Post-edit information, as described by a given quality measure, is used as valuable feedback within the passive-aggressive framework, adapting the statistical models on-line. First, by modifying the translation model parameters, and alternatively, by adapting the scaling factors present in state-of-the-art SMT systems. Experimental results show improvements in translation quality by allowing the system to learn on a sentence-by-sentence basis.

Pascual Martínez-Gómez, Germán Sanchis-Trilles, Francisco Casacuberta

Feature Set Search Space for FuzzyBoost Learning

This paper presents a novel approach to the weak classifier selection based on the GentleBoost framework, based on sharing a set of features at each round. We explore the use of linear dimensionality reduction methods to guide the search for features that share some properties, such as correlations and discriminative properties. We add this feature set as a new parameter of the decision stump, which turns the single branch selection of the classic stump into a fuzzy decision that weights the contribution of both branches. The weights of each branch act as a confidence measure based on the feature set characteristics, which increases the accuracy and robustness to data perturbations. We propose an algorithm that consider the similarities between the weights provided by three linear mapping algorithms: PCA, LDA and MMLMNN [14]. We propose to analyze the row vectors of the linear mapping, grouping vector components with very similar values. Then, the created groups are the inputs of the FuzzyBoost algorithm. This search procedure generalizes the previous temporal FuzzyBoost [10] to any type of features. We present results in features with spatial support (images) and spatio-temporal support (videos), showing the generalization properties of the FuzzyBoost algorithm in other scenarios.

Plinio Moreno, Pedro Ribeiro, José Santos-Victor

Interactive Structured Output Prediction: Application to Chromosome Classification

Interactive Pattern Recognition concepts and techniques are applied to problems with structured output; i.e., problems in which the result is not just a simple class label, but a suitable structure of labels. For illustration purposes (a simplification of) the problem of Human Karyotyping is considered. Results show that a) taking into account label dependencies in a karyogram significantly reduces the classical (non-interactive) chromosome label prediction error rate and b) they are further improved when interactive processing is adopted.

Jose Oncina, Enrique Vidal

On the Use of Diagonal and Class-Dependent Weighted Distances for the Probabilistic k-Nearest Neighbor

A probabilistic k-nn (PKnn) method was introduced in [13] under the Bayesian point of view. This work showed that posterior inference over the parameter

k

can be performed in a relatively straightforward manner using Markov Chain Monte Carlo (MCMC) methods. This method was extended by Everson and Fieldsen [14] to deal with metric learning. In this work we propose two different dissimilarities functions to be used inside this PKnn framework. These dissimilarities functions can be seen as a simplified version of the full-covariance distance functions just proposed. Furthermore we propose to use a class- dependent dissimilarity function as proposed in [8] aim at improving the k-nn classifier. In the present work we pursue a simultaneously learning of the dissimilarity function parameters together with the parameter

k

of the k-nn classifier. The experiments show that this simultaneous learning lead to an improvement of the classifier with respect to the standard k-nn and state-of-the-art technique as well.

Roberto Paredes, Mark Girolami

Explicit Length Modelling for Statistical Machine Translation

Explicit length modelling has been previously explored in statistical pattern recognition with successful results. In this paper, two length models along with two parameter estimation methods for statistical machine translation (SMT) are presented. More precisely, we incorporate explicit length modelling in a state-of-the-art log-linear SMT system as an additional feature function in order to prove the contribution of length information. Finally, promising experimental results are reported on a reference SMT task.

Joan Albert Silvestre-Cerdà, Jesús Andrés-Ferrer, Jorge Civera

Poster Sessions

Computer Vision

Age Regression from Soft Aligned Face Images Using Low Computational Resources

The initial step in most facial age estimation systems consists of accurately aligning a model to the output of a face detector (e.g. an Active Appearance Model). This fitting process is very expensive in terms of computational resources and prone to get stuck in local minima. This makes it impractical for analysing faces in resource limited computing devices. In this paper we build a face age regressor that is able to work directly on faces cropped using a state-of-the-art face detector. Our procedure uses K nearest neighbours (K-NN) regression with a metric based on a properly tuned Fisher Linear Discriminant Analysis (LDA) projection matrix. On FG-NET we achieve a state-of-the-art Mean Absolute Error (MAE) of 5.72 years with manually aligned faces. Using face images cropped by a face detector we get a MAE of 6.87 years in the same database. Moreover, most of the algorithms presented in the literature have been evaluated on single database experiments and therefore, they report optimistically biased results. In our cross-database experiments we get a MAE of roughly 12 years, which would be the expected performance in a real world application.

Juan Bekios-Calfa, José M. Buenaposada, Luis Baumela

Human Activity Recognition from Accelerometer Data Using a Wearable Device

Activity Recognition is an emerging field of research, born from the larger fields of ubiquitous computing, context-aware computing and multimedia. Recently, recognizing everyday life activities becomes one of the challenges for pervasive computing. In our work, we developed a novel wearable system easy to use and comfortable to bring. Our wearable system is based on a new set of 20 computationally efficient features and the Random Forest classifier. We obtain very encouraging results with classification accuracy of human activities recognition of up to 94%.

Pierluigi Casale, Oriol Pujol, Petia Radeva

Viola-Jones Based Detectors: How Much Affects the Training Set?

This paper presents a study on the facial feature detection performance achieved using the Viola-Jones framework. A set of classifiers using two different focuses to gather the training samples is created and tested on four different datasets covering a wide range of possibilities. The results achieved should serve researchers to choose the classifier that better fits their demands.

Modesto Castrillón-Santana, Daniel Hernández-Sosa, Javier Lorenzo-Navarro

Fast Classification in Incrementally Growing Spaces

The classification speed of state-of-the-art classifiers such as SVM is an important aspect to be considered for emerging applications and domains such as data mining and human-computer interaction. Usually, a test-time speed increase in SVMs is achieved by somehow reducing the number of support vectors, which allows a faster evaluation of the decision function. In this paper a novel approach is described for fast classification in a PCA+SVM scenario. In the proposed approach, classification of an unseen sample is performed incrementally in increasingly larger feature spaces. As soon as the classification confidence is above a threshold the process stops and the class label is retrieved. Easy samples will thus be classified using less features, thus producing a faster decision. Experiments in a gender recognition problem show that the method is by itself able to give good speed-error tradeoffs, and that it can also be used in conjunction with other SV-reduction algorithms to produce tradeoffs that are better than with either approach alone.

Oscar Déniz-Suárez, Modesto Castrillón, Javier Lorenzo, Gloria Bueno, Mario Hernández

Null Space Based Image Recognition Using Incremental Eigendecomposition

An incremental approach to the discriminative common vector (DCV) method for image recognition is considered. Discriminative projections are tackled in the particular context in which new training data becomes available and learned subspaces may need continuous updating. Starting from incremental eigendecomposition of scatter matrices, an efficient updating rule based on projections and orthogonalization is given. The corresponding algorithm has been empirically assessed and compared to its batch counterpart. The same good properties and performance results of the original method are kept but with a dramatic decrease in the computation needed.

Katerine Diaz-Chito, Francesc J. Ferri, Wladimiro Díaz-Villanueva

Multi-sensor People Counting

An accurate estimation of the number of people entering / leaving a controlled area is an interesting capability for automatic surveillance systems. Potential applications where this technology can be applied include those related to security, safety, energy saving or fraud control. In this paper we present a novel configuration of a multi-sensor system combining both visual and range data specially suited for troublesome scenarios such as public transportation. The approach applies probabilistic estimation filters on raw sensor data to create intermediate level hypothesis that are later fused using a certainty-based integration stage. Promising results have been obtained in several tests performed on a realistic test bed scenario under variable lightning conditions.

Daniel Hernández-Sosa, Modesto Castrillón-Santana, Javier Lorenzo-Navarro

Lossless Compression of Polar Iris Image Data

The impact of using different lossless compression algorithms when compressing biometric iris sample data from several public iris databases is investigated. In particular, the application of dedicated lossless image codecs (lossless JPEG, JPEG-LS, PNG, and GIF), lossless variants of lossy codecs (JPEG2000, JPEG XR, and SPIHT), and a few general purpose file compression schemes is compared. We specifically focus on polar iris images (as a result after iris detection, iris extraction, and mapping to polar coordinates). The results are discussed in the light of the recent ISO/IEC FDIS 19794-6 standard and IREX recommendations.

Kurt Horvath, Herbert Stögner, Andreas Uhl, Georg Weinhandel

Learning Features for Human Action Recognition Using Multilayer Architectures

This paper presents an evaluation of two multilevel architectures in the human action recognition (HAR) task. By combining low level features with multi-layer learning architectures, we infer discriminative semantic features that highly improve the classification performance. This approach eliminates the difficult process of selecting good mid-level feature descriptors, changing the feature selection and extraction process by a learning stage. The data probability distribution is modeled by a multi-layer graphical model. In this way, this approach is different to the standard ones. Experiments on KTH and Weizmann video sequence databases are carried out in order to evaluate the performance of the proposal. The results show that the new learnt features offer a classification performance comparable to the state-of-the-art on these databases.

Manuel Jesús Marín-Jiménez, Nicolás Pérez de la Blanca, María Ángeles Mendoza

Human Recognition Based on Gait Poses

This paper introduces a new approach for gait analysis based on the Gait Energy Image (GEI). The main idea is to segment the gait cycle into some biomechanical poses, and to compute a particular GEI for each pose. Pose-based GEIs can better represent body parts and dynamics descriptors with respect to the usually blurred depiction provided by a general GEI. Gait classification is carried out by fusing separated pose-based decisions. Experiments on human identification prove the benefits of this new approach when compared to the original GEI method.

Raúl Martín-Félez, Ramón A. Mollineda, Javier Salvador Sánchez

On-Line Classification of Data Streams with Missing Values Based on Reinforcement Learning

In some applications, data arrive sequentially and they are not available in batch form, what makes difficult the use of traditional classification systems. In addition, some attributes may lack due to some real-world conditions. For this problem, a number of decisions have to be made regarding how to proceed with the incomplete and unlabeled incoming objects, how to guess its missing attributes values, how to classify it, whether to include it in the training set, or when to ask for the class label to an expert. Unfortunately, no decision works well for all data sets. This data dependency motivates our formulation of the problem in terms of elements of reinforcement learning. The application of this learning paradigm for this problem is, to the best of our knowledge, novel. The empirical results are encouraging since the proposed framework behaves better and more generally than many strategies used isolatedly, and makes an efficient use of human effort (requests for the class label to an expert) and computer memory (the increase of size of the training set).

Mónica Millán-Giraldo, Vicente Javier Traver, J. Salvador Sánchez

Opponent Colors for Human Detection

Human detection is a key component in fields such as advanced driving assistance and video surveillance. However, even detecting non-occluded standing humans remains a challenge of intensive research. Finding good features to build human models for further detection is probably one of the most important issues to face. Currently, shape, texture and motion features have deserve extensive attention in the literature. However, color-based features, which are important in other domains (

e

.

g

., image categorization), have received much less attention. In fact, the use of RGB color space has become a kind of choice

by default

. The focus has been put in developing first and second order features on top of RGB space (

e

.

g

., HOG and co-occurrence matrices, resp.). In this paper we evaluate the opponent colors (OPP) space as a biologically inspired alternative for human detection. In particular, by feeding OPP space in the baseline framework of Dalal et al. for human detection (based on RGB, HOG and linear SVM), we will obtain better detection performance than by using RGB space. This is a relevant result since, up to the best of our knowledge, OPP space has not been previously used for human detection. This suggests that in the future it could be worth to compute co-occurrence matrices, self-similarity features, etc., also on top of OPP space,

i

.

e

., as we have done with HOG in this paper.

Rao Muhammad Anwer, David Vázquez, Antonio M. López

Automatic Detection of Facial Feature Points via HOGs and Geometric Prior Models

Most applications dealing with problems involving the face require a robust estimation of the facial salient points. Nevertheless, this estimation is not usually an automated preprocessing step in applications dealing with facial expression recognition. In this paper we present a simple method to detect facial salient points in the face. It is based on a prior Point Distribution Model and a robust object descriptor. The model learns the distribution of the points from the training data, as well as the amount of variation in location each point exhibits. Using this model, we reduce the search areas to look for each point. In addition, we also exploit the global consistency of the points constellation, increasing the detection accuracy. The method was tested on two separate data sets and the results, in some cases, outperform the state of the art.

Mario Rojas Quiñones, David Masip, Jordi Vitrià

Rectifying Non-euclidean Similarity Data through Tangent Space Reprojection

This paper concerns the analysis of shapes characterised in terms of dissimilarities rather than vectors of ordinal shape-attributes. Such characterisations are rarely metric, and as a result shape or pattern spaces can not be constructed via embeddings into a Euclidean space. The problem arises when the similarity matrix has negative eigenvalues. One way to characterise the departures from metricty is to use the relative mass of negative eigenvalues, or negative eigenfraction. In this paper, we commence by developing a new measure which gauges the extent to which individual data give rise to departures from metricity in a set of similarity data. This allows us to assess whether the non-Euclidean artifacts in a data-set can be attributed to individual objects or are distributed uniformly. Our second contribution is to develop a new means of rectifying non-Euclidean similarity data. To do this we represent the data using a graph on a curved manifold of constant curvature (i.e. hypersphere). Xu et. al. have shown how the rectification process can be effected by evolving the hyperspheres under the Ricci flow. However, this can have effect of violating the proximity constraints applying to the data. To overcome problem, here we show how to preserve the constraints using a tangent space representation that captures local structures. We demonstrate the utility of our method on the standard “chicken pieces” dataset.

Weiping Xu, Edwin R. Hancock, Richard C. Wilson

Image Processing and Analysis

Gait Identification Using a Novel Gait Representation: Radon Transform of Mean Gait Energy Image

Gait is one of the most practical biometric techniques which present the capability to recognize individuals from distance. In this study, we propose a novel gait template based on Radon Transform of Mean Gait Energy Image, as RTMGEI. Robustness against image noises and reducing data dimensionality can be achieved by using Radon Transform, as well as capturing variations of Mean Gait Energy Images (MGEIs) over their centers. Feature extraction is done by applying the Zernike moments to RTMGEIs. Orthogonal property of Zernike moment basis functions guarantees the statistically independence of coefficients in extracted feature vectors. The Euclidean minimum distance is used as the classifier. The our proposed method is evaluated on the CASIA database. Results show that our method outperforms recently presented works due to its high performance.

Farhad Bagher Oskuie, Karim Faez, Ali Cheraghian, Hamidreza Dastmalchi

New Algorithm for Segmentation of Images Represented as Hypergraph Hexagonal-Grid

This paper presents a new method for segmentation of images into regions and for boundary extraction that reflect objects present in the image scene. The unified framework for image processing uses a grid structure defined on the set of pixels from an image. We propose a segmentation algorithm based on hypergraph structure which produces a maximum spanning tree of a visual hypergraph constructed on the grid structure, and we consider the HCL (Hue-Chroma-Luminance) color space representation. Our technique has a time complexity lower than the methods from the specialized literature, and the experimental results on the

Berkeley

color image database show that the performance of the method is robust.

Dumitru Burdescu, Marius Brezovan, Eugen Ganea, Liana Stanescu

Statistical and Wavelet Based Texture Features for Fish Oocytes Classification

The study of biology and population dynamics of fish species requires the estimation of fecundity parameters in individual fish in many fisheries laboratories. The traditional procedure used in fisheries research is to classify and count the oocytes manually on a subsample of known weight of the ovary, and to measure few oocytes under a binocular microscope. With an adequate interactive tool, this process might be done on a computer. However, in both cases the task is very time consuming, with the obvious consequence that fecundity studies are not conducted routinely. In this work we develop a computer vision system for the classification of oocytes using texture features in histological images. The system is structured in three stages: 1) extraction of the oocyte from the original image; 2) calculation of a texture feature vector for each oocyte; and 3) classification of the oocytes using this feature vector. A statistical evaluation of the proposed system is presented and discussed.

Encarnación González-Rufino, Pilar Carrión, Arno Formella, Manuel Fernández-Delgado, Eva Cernadas

Using Mathematical Morphology for Similarity Search of 3D Objects

In this paper we use the erosion and dilation operators for characterizing 3D polygonal objects. The goal is to perform a similarity search in a set of distinct objects. The method applies successive dilations and erosions of the meshes in order to compute the difference volume as a function of the size of the structuring element. Because of appropriate pre-processing, the resulting function is invariant to translation, rotation and mesh resolution. On a set of 32 complex objects with different mesh resolutions, the method achieved an average ranking rate of 1.47, with 23 objects ranked first and 6 objects ranked second.

Roberto Lam, J. M. Hans du Buf

Trajectory Analysis Using Switched Motion Fields: A Parametric Approach

This paper presents a new model for trajectories in video sequences using mixtures of motion fields. Each field is described by a simple parametric model with only a few parameters. We show that, despite the simplicity of the motion fields, the overall model is able to generate complex trajectories occuring in video analysis.

Jorge S. Marques, João M. Lemos, Mário A. T. Figueiredo, Jacinto C. Nascimento, Miguel Barão

Semi-supervised Probabilistic Relaxation for Image Segmentation

In this paper, a semi-supervised approach based on probabilistic relaxation theory is presented. Focused on image segmentation, the presented technique combines two desirable properties; a very small number of labelled samples is needed and the assignment of labels is consistently performed according to our contextual information constraints. Our proposal has been tested on medical images from a dermatology application with quite promising preliminary results. Not only the unsupervised accuracies have been improved as expected but similar accuracies to other semi-supervised approach have been obtained using a considerably reduced number of labelled samples. Results have been also compared with other powerful and well-known unsupervised image segmentation techniques, improving significantly their results.

Adolfo Martínez-Usó, Filiberto Pla, José M. Sotoca, Henry Anaya-Sánchez

Poker Vision: Playing Cards and Chips Identification Based on Image Processing

This paper presents an approach to the identification of playing cards and counting of chips in a poker game environment, using an entry-level webcam and computer vision methodologies. Most of the previous works on playing cards identification rely on optimal camera position and controlled environment. The presented approach is intended to suit a real and uncontrolled environment along with its constraints. The recognition of playing cards lies on template matching, while the counting of chips is based on colour segmentation combined with the Hough Circles Transform. With the proposed approach it is possible to identify the cards and chips in the table correctly. The overall accuracy of the rank identification achieved is around 94%.

Paulo Martins, Luís Paulo Reis, Luís Teófilo

Occlusion Management in Sequential Mean Field Monte Carlo Methods

In this paper we analyse the problem of occlusions under a Mean Field Monte Carlo approach. This kind of approach is suitable to approximate inference in problems such as multitarget tracking, in which this paper is focused. It leads to a set of fixed point equations, one for each target, that can be solved iteratively. While previous works considered independent likelihoods and pairwise interactions between objects, in this work we assume a more realistic joint likelihood that helps to cope with occlusions. Since the joint likelihood can truly depend on several objects, a high dimensional integral appears in the raw approach. We consider an approximation to make it computationally feasible. We have tested the proposed approach on football and indoor surveillance sequences, showing that a low number of failures can be achieved.

Carlos Medrano, Raúl Igual, Carlos Orrite, Inmaculada Plaza

New Approach for Road Extraction from High Resolution Remotely Sensed Images Using the Quaternionic Wavelet

Automatic network road extraction from high resolution remotely sensed images has been under study by computer scientists for over 30 years. In fact, Conventional methods to create and update road information rely heavily on manual work and therefore are very expensive and time consuming. This paper presents an efficient and computationally fast method to extract road from very high resolution images automatically. We propose in this paper a new approach for following roads path based on a quaternionic wavelet transform insuring a good local space-frequency analysis with very important directional selectivity. In fact, the rich phase information given by this hypercomplex transform overcomes the lack of shift invariance property shown by the real discrete wavelet transform and the poor directional selectivity of both real and complex wavelet transform.

Mohamed Naouai, Atef Hamouda, Aroua Akkari, Christiane Weber

On the Influence of Spatial Information for Hyper-spectral Satellite Imaging Characterization

Land-use classification for hyper-spectral satellite images requires a previous step of pixel characterization. In the easiest case, each pixel is characterized by its spectral curve. The improvement of the spectral and spatial resolution in hyper-spectral sensors has led to very large data sets. Some researches have focused on better classifiers that can handle big amounts of data. Others have faced the problem of band selection to reduce the dimensionality of the feature space. However, thanks to the improvement in the spatial resolution of the sensors, spatial information may also provide new features for hyper-spectral satellite data. Here, an study on the influence of spectral-spatial features combined with an unsupervised band selection method is presented. The results show that it is possible to reduce very significantly the number of spectral bands required while having an adequate description of the spectral-spatial characteristics of the image for pixel classification tasks.

Olga Rajadell, Pedro García-Sevilla, Filiberto Pla

Natural Material Segmentation and Classification Using Polarisation

This paper uses polarisation information for surface segmentation based on material reflectance characteristics. Both polarised and unpolarised light is used, and the method is hence applicable to both specular or diffuse polarisation. We use moments to estimate the mean-intensity, polarisation and phase from images obtained with multiple polariser orientations. From the Fresnel theory, the azimuth angle of the surface normal is determined by the phase angle and for a limited range of refractive index the zenith angle is determined by the degree of polarisation. Using these properties, we show how the angular distribution of the mean intensity for remitted light can be parameterised using spherical harmonics. We explore two applications of our technique, namely a) detecting skin lesions in damaged fruit, and b) exploiting spherical harmonic co-efficients to segment surfaces into regions of different material composition using normalized graph cuts.

Nitya Subramaniam, Gul e Saman, Edwin R. Hancock

Reflection Component Separation Using Statistical Analysis and Polarisation

We show how to cast the problem of specularity subtraction as blind source separation from polarisation images. We commence by summarizing the relationships between the specular and diffuse reflection components for polarised images. We show how to use singular value decomposition for component separation. In particular, we show how reliable results can be obtained using three images acquired with different polariser angles under diffuse reflection. The proposed method can be used as the preprocessing step in shape from shading, segmentation, reflectance estimation and many other computer vision applications.

Lichi Zhang, Edwin R. Hancock, Gary A. Atkinson

Pattern Recognition

Characterizing Graphs Using Approximate von Neumann Entropy

In this paper we show how to approximate the von Neumann entropy associated with the Laplacian eigenspectrum of graphs and exploit it as a characteristic for the clustering and classification of graphs. We commence from the von Neumann entropy and approximate it by replacing the Shannon entropy by its quadratic counterpart. We then show how the quadratic entropy can be expressed in terms of a series of permutation invariant traces. This leads to a simple approximate form for the entropy in terms of the elements of the adjacency matrix which can be evaluated in quadratic time. We use this approximate expression for the entropy as a unary characteristic for graph clustering. Experiments on real world data illustrate the effectiveness of the method.

Lin Han, Edwin R. Hancock, Richard C. Wilson

A Distance for Partially Labeled Trees

In a number of practical situations, data have structure and the relations among its component parts need to be coded with suitable data models. Trees are usually utilized for representing data for which hierarchical relations can be defined. This is the case in a number of fields like image analysis, natural language processing, protein structure, or music retrieval, to name a few. In those cases, procedures for comparing trees are very relevant. An approximate tree edit distance algorithm has been introduced for working with trees labeled only at the leaves. In this paper, it has been applied to handwritten character recognition, providing accuracies comparable to those by the most comprehensive search method, being as efficient as the fastest.

Jorge Calvo, David Rizo, José M. Iñesta

An Online Metric Learning Approach through Margin Maximization

This work introduces a method based on learning similarity measures between pairs of objects in any representation space that allows to develop convenient recognition algorithms. The problem is formulated through margin maximization over distance values so that it can discriminate between similar (intra-class) and dissimilar (inter-class) elements without enforcing positive definiteness of the metric matrix as in most competing approaches. A passive-aggressive approach has been adopted to carry out the corresponding optimization procedure. The proposed approach has been empirically compared to state of the art metric learning on several publicly available databases showing its potential both in terms of performance and computation results.

Adrian Perez-Suay, Francesc J. Ferri, Jesús V. Albert

Graph Matching on a Low-Cost and Parallel Architecture

This paper presents a new parallel algorithm to compute the graph-matching based on the Graduated Assignment. The aim of this paper is to perform graph matching in a current desktop computer, but, instead of executing the code in the generic processor, we execute a parallel code in the graphic processor unit. This computer can be embedded in a low-cost pattern recognition system or a mobile robot. Experiments show a speed-up of the run time around 400 times, which makes the use of attributed graphs to represent objects a valid solution.

David Rodenas, Francesc Serratosa, Albert Solé

A Probabilistic Framework to Obtain a Common Labelling between Attributed Graphs

The computation of a common labelling of a set of graphs is required to find a representative of a given graph set. Although this is a NP-problem, practical methods exist to obtain a sub-optimal common labelling in polynomial time. We consider the graphs in the set have a Gaussian distortion, and so, the average labelling is the one that obtains the best common labelling. In this paper, we present two new algorithms to find a common labelling between a set of attributed graphs, which are based on a probabilistic framework. They have two main advantages. From the theoretical point of view, no additional nodes are artificial introduced to obtain the common labelling, and so, the structure of the graphs in the set is kept unaltered. From the practical point of view, results show that the presented algorithms outperform state-of-the-art algorithms.

Albert Solé-Ribalta, Francesc Serratosa

Feature Selection with Complexity Measure in a Quadratic Programming Setting

Feature selection is a topic of growing interest mainly due to the increasing amount of information, being an essential task in many machine learning problems with high dimensional data. The selection of a subset of relevant features help to reduce the complexity of the problem and the building of robust learning models. This work presents an adaptation of a recent quadratic programming feature selection technique that identifies in one-fold the redundancy and relevance on data. Our approach introduces a non-probabilistic measure to capture the relevance based on Minimum Spanning Trees. Three different real datasets were used to assess the performance of the adaptation. The results are encouraging and reflect the utility of feature selection algorithms.

Ricardo Sousa, Hélder P. Oliveira, Jaime S. Cardoso

Automatic Estimation of the Number of Segmentation Groups Based on MI

Clustering is important in medical imaging segmentation. The number of segmentation groups is often needed as an initial condition, but is often unknown. We propose a method to estimate the number of segmentation groups based on mutual information, anisotropic diffusion model and class-adaptive Gauss-Markov random fields. Initially, anisotropic diffusion is used to decrease the image noise. Subsequently, the class-adaptive Gauss-Markov modeling and mutual information are used to determine the number of segmentation groups. This general formulation enables the method to easily adapt to various kinds of medical images and the associated acquisition artifacts. Experiments on simulated, and multi-model data demonstrate the advantages of the method over the current state-of-the-art approaches.

Ziming Zeng, Wenhui Wang, Longzhi Yang, Reyer Zwiggelaar

Applications

Vitality Assessment of Boar Sperm Using N Concentric Squares Resized (NCSR) Texture Descriptor in Digital Images

Two new textural descriptor, named N Concentric Squares Resized (NCSR) and N Concentric Squares Histogram (NCSH), have been proposed. These descriptors were used to classify 472 images of alive spermatozoa heads and 376 images of dead spermatozoa heads. The results obtained with these two novel descriptors have been compared with a number of classical descriptors such as Haralick, Pattern Spectrum, WSF, Zernike, Flusser and Hu. The feature vectors computed have been classified using kNN and a backpropagation Neural Network. The error rate obtained for NCSR with

N

 = 11 was of 23.20% outperforms the rest of descriptors. Also, the area under the ROC curve (AUC) and the values observed in the ROC curve indicates the performance of the proposed descriptor is better than the others texture description methods.

Enrique Alegre, María Teresa García-Ordás, Víctor González-Castro, S. Karthikeyan

Filled-in Document Identification Using Local Features and a Direct Voting Scheme

In this work, an approach combining local representations with a direct voting scheme on a

k

-nearest neighbors classifier to identify filled-in document images is presented. A document class is represented by a high number of local feature vectors selected from its reference image using a given criterion. In the test phase, a number of vectors are equally selected from an image and used to classify it. The experimental results show that the parameterization is not critical, and good performances in terms of error-rate and processing time can be obtained, even though the test documents contain a large proportion of filled-in regions, obviously not present in the reference images.

Joaquim Arlandis, Vicent Castello-Fos, Juan-Carlos Perez-Cortes

Combining Growcut and Temporal Correlation for IVUS Lumen Segmentation

The assessment of arterial luminal area, performed by IVUS analysis, is a clinical index used to evaluate the degree of coronary artery disease. In this paper we propose a novel approach to automatically segment the vessel lumen, which combines model-based temporal information extracted from successive frames of the sequence, with spatial classification using the Growcut algorithm. The performance of the method is evaluated by an

in vivo

experiment on 300 IVUS frames. The automatic and manual segmentation performances in general vessel and stent frames are comparable. The average segmentation error in vessel, stent and bifurcation frames are 0.17±0.08 mm, 0.18±0.07 mm and 0.31±0.12 mm respectively.

Simone Balocco, Carlo Gatta, Francesco Ciompi, Oriol Pujol, Xavier Carrillo, Josepa Mauri, Petia Radeva

Topographic EEG Brain Mapping before, during and after Obstructive Sleep Apnea Episodes

Obstructive Sleep Apnea Syndrome (OSAS) is a very common sleep disorder that is associated with several neurocognitive impairments. The present study aims to assess the electroencephalographic (EEG) power before, during and after obstructive apnea episodes, in four frequency bands: delta (

δ

), theta (

θ

), alpha (

α

) and beta (

β

). For that purpose, continuous wavelet transform was applied to the EEG signals obtained with polysomnography, and topographic EEG brain mapping (EBM) to visualize the power differences across the whole brain. The results demonstrate that there is a significant decrease in the EEG

δ

power during OSAS that does not totally recover immediately after the episode. Furthermore, a power decrease in a specific brain region was noticed for all EEG frequency ranges.

David Belo, Ana Luísa Coito, Teresa Paiva, João Miguel Sanches

Classifying Melodies Using Tree Grammars

Similarity computation is a difficult issue in music information retrieval, because it tries to emulate the special ability that humans show for pattern recognition in general, and particularly in the presence of noisy data. A number of works have addressed the problem of what is the best representation for symbolic music in this context. The tree representation, using rhythm for defining the tree structure and pitch information for leaf and node labeling has proven to be effective in melodic similarity computation. In this paper we propose a solution when we have melodies represented by trees for the training but the duration information is not available for the input data. For that, we infer a probabilistic context-free grammar using the information in the trees (duration and pitch) and classify new melodies represented by strings using only the pitch. The case study in this paper is to identify a snippet query among a set of songs stored in symbolic format. For it, the utilized method must be able to deal with inexact queries and efficient for scalability issues.

José Francisco Bernabeu, Jorge Calera-Rubio, José Manuel Iñesta

A Tree Classifier for Automatic Breast Tissue Classification Based on BIRADS Categories

Breast tissue density is an important risk factor in the detection of breast cancer. It is also known that interpretation of mammogram lesions is more difficult in dense tissues. Therefore, getting a preliminary tissue classification may aid in the subsequent process of breast lesion detection and analysis. This article reviews several classification techniques for two datasets, both digitized screen-film (SFM) and full-field digital (FFDM) mammography, classified according to BIRADS categories. It concludes with a tree classification procedure based on the combination of two classifiers on texture features. Statistical analysis to test the normality and homoscedasticity of the features was carried. Thus, just features that are significant influenced by the tissue type were considered. The results obtained on 322 mammograms of the SFM dataset and on 1137 mammograms of the FFDM dataset demonstrate that up to 80% of samples were correctly classified using using 10-fold cross-validation to train and test the classifiers.

Noelia Vállez, Gloria Bueno, Oscar Déniz-Suárez, José A. Seone, Julián Dorado, Alejandro Pazos

Diagnostic of Pathology on the Vertebral Column with Embedded Reject Option

Computer aided diagnosis systems with the capability of automatically decide if a patient has or not a pathology and to hold the decision on the dificult cases, are becoming more frequent. The latter are afterwards reviewed by an expert reducing therefore time consuption on behalf of the expert. The number of cases to review depends on the cost of erring the diagnosis. In this work we analyse the incorporation of the option to hold a decision on the diagnostic of pathologies on the vertebral column. A comparison with several state of the art techniques is performed. We conclude by showing that the use of the reject option techniques is an asset in line with the current view of the research community.

Ajalmar R. da Rocha Neto, Ricardo Sousa, Guilherme de A. Barreto, Jaime S. Cardoso

Language Identification for Interactive Handwriting Transcription of Multilingual Documents

An effective approach to handwriting transcription of (old) documents is to follow a sequential, line-by-line transcription of the whole document, in which a continuously retrained system interacts with the user. In the case of multilingual documents, however, a minor yet important issue for this interactive approach is to first identify the language of the current text line image to be transcribed. In this paper, we propose a probabilistic framework and three techniques for this purpose. Empirical results are reported on an entire 764-page multilingual document for which previous empirical tests were limited to its first 180 pages, written only in Spanish.

Miguel A. del Agua, Nicolás Serrano, Alfons Juan

vManager, Developing a Complete CBVR System

Content-Based Video Retrieval (CBVR) is a research area that has drawn a good deal of attention in recent years. The ability to retrieve videos similar to a given one in terms of implicit features (mainly pictorial features) and/or explicit characteristics (eg. semantic context) are the cornerstones of this growing interest. In this paper we present the results obtained within the project

vManager

, a CBVR system based on local color and motion signatures, our own video representation and different metrics of similarity among videos. The results for real videos point to promising advances, not only as regards the effectiveness of the system, but also in terms of its efficiency.

Andrés Caro, Pablo G. Rodríguez, Rubén Morcillo, Manuel Barrena

On the Use of Dot Scoring for Speaker Diarization

In this paper, an alternative dot scoring based agglomerative hierarchical clustering approach for speaker diarization is presented. Dot-scoring is a simple and fast technique used in speaker verification that makes use of a linearized procedure to score test segments against target models. In our speaker diarization approach speech segments are represented by MAP-adapted GMM zero and first order statistics, dot scoring is applied to compute a similarity measure between segments (or clusters) and finally an agglomerative clustering algorithm is applied until no pair of clusters exceeds a similarity threshold. This diarization system was developed for the Albayzin 2010 Speaker Diarization Evaluation on broadcast news. Results show that the lowest error rate that the clustering algorithm could attain for the evaluation set was around 20% and that over-segmentation was the main source of degradation, due to the lack of robustness in the estimation of statistics for short segments.

Mireia Diez, Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-Fuentes, German Bordel

A Bag-of-Paths Based Serialized Subgraph Matching for Symbol Spotting in Line Drawings

In this paper we propose an error tolerant subgraph matching algorithm based on bag-of-paths for solving the problem of symbol spotting in line drawings. Bag-of-paths is a factorized representation of graphs where the factorization is done by considering all the acyclic paths between each pair of connected nodes. Similar paths within the whole collection of documents are clustered and organized in a lookup table for efficient indexing. The lookup table contains the index key of each cluster and the corresponding list of locations as a single entry. The mean path of each of the clusters serves as the index key for each table entry. The spotting method is then formulated by a spatial voting scheme to the list of locations of the paths that are decided in terms of search of similar paths that compose the query symbol. Efficient indexing of common substructures helps to reduce the computational burden of usual graph based methods. The proposed method can also be seen as a way to serialize graphs which allows to reduce the complexity of the subgraph isomorphism. We have encoded the paths in terms of both attributed strings and turning functions, and presented a comparative results between them within the symbol spotting framework. Experimentations for matching different shape silhouettes are also reported and the method has been proved to work in noisy environment also.

Anjan Dutta, Josep Lladós, Umapada Pal

Handwritten Word Spotting in Old Manuscript Images Using a Pseudo-structural Descriptor Organized in a Hash Structure

There are lots of historical handwritten documents with information that can be used for several studies and projects. The Document Image Analysis and Recognition community is interested in preserving these documents and extracting all the valuable information from them. Handwritten word-spotting is the pattern classification task which consists in detecting handwriting word images. In this work, we have used a query-by-example formalism: we have matched an input image with one or multiple images from handwritten documents to determine the distance that might indicate a correspondence. We have developed an approach based in characteristic Loci Features stored in a hash structure. Document images of the marriage licences of the Cathedral of Barcelona are used as the benchmarking database.

David Fernández, Josep Lladós, Alicia Fornés

Identification of Erythrocyte Types in Greyscale MGG Images for Computer-Assisted Diagnosis

In the paper an algorithm for the recognition of erythrocytes is presented and experimentally evaluated. The objects of interest are localised and extracted from digital microscopic images, stained by means of the MGG (May-Grunwald-Giemsa) method in greyscale. The area covering a single red blood cell (RBC) is transformed from Cartesian to polar co-ordinates. Later, the two-dimensional Fourier transform is applied to the resultant image. Finally, the subpart of the spectrum is selected in order to represent an object. This description (Polar-Fourier Greyscale Descriptor) is matched with the templates represented in the same way. The smallest dissimilarity measure indicates the recognised erythrocyte type. When using this approach every RBC is investigated, and basing on the whole knowledge about the number of particular types of erythrocytes present in an image a diagnosis can be made.

Dariusz Frejlichowski

Classification of High Dimensional and Imbalanced Hyperspectral Imagery Data

The present paper addresses the problem of the classification of hyperspectral images with multiple imbalanced classes and very high dimensionality. Class imbalance is handled by resampling the data set, whereas PCA is applied to reduce the number of spectral bands. This is a preliminary study that pursues to investigate the benefits of using together these two techniques, and also to evaluate the application order that leads to the best classification performance. Experimental results demonstrate the significance of combining these preprocessing tools to improve the performance of hyperspectral imagery classification. Although it seems that the most effective order of application corresponds to first a resampling algorithm and then PCA, this is a question that still needs a much more thorough investigation.

Vicente García, Javier Salvador Sánchez, Ramón A. Mollineda

Active Learning for Dialogue Act Labelling

Active learning is a useful technique that allows for a considerably reduction of the amount of data we need to manually label in order to reach a good performance of a statistical model. In order to apply active learning to a particular task we need to previously define an effective selection criteria, that picks out the most informative samples at each iteration of active learning process. This is still an open problem that we are going to face in this work, in the task of dialogue annotation at dialogue act level. We present two different criteria, weighted number of hypothesis and entropy, that we have applied to the Sample Selection Algorithm for the task of dialogue act labelling, that retrieved appreciably improvements in our experimental approach.

Fabrizio Ghigi, Vicent Tamarit, Carlos-D. Martínez-Hinarejos, José-Miguel Benedí

Multi-class Probabilistic Atlas-Based Segmentation Method in Breast MRI

Organ localization is an important topic in medical imaging in aid of cancer treatment and diagnosis. An example are the pharmacokinetic model calibration methods based on a reference tissue, where a pectoral muscle delineation in breast MRI is needed to detect malignancy signs. Atlas-based segmentation has been proven to be powerful in brain MRI. This is the first attempt to apply an atlas-based approach to segment breast in T1 weighted MR images. The atlas consists of 5 structures (fatty and dense tissues, heart, lungs and pectoral muscle). It has been used in a Bayesian segmentation framework to delineate the mentioned structures. Global and local registration have been compared, where global registration showed the best results in terms of accuracy and speed. Overall, a Dice Similarity Coefficient value of 0.8 has been obtained which shows the validity of our approach to Breast MRI segmentation.

Albert Gubern-Mérida, Michiel Kallenberg, Robert Martí, Nico Karssemeijer

Impact of the Approaches Involved on Word-Graph Derivation from the ASR System

Finding the most likely sequence of symbols given a sequence of observations is a classical pattern recognition problem. This problem is frequently approached by means of the Viterbi algorithm, which aims at finding the most likely sequence of states within a trellis given a sequence of observations. Viterbi algorithm is widely used within the automatic speech recognition (ASR) framework to find the expected sequence of words given the acoustic utterance in spite of providing a suboptimal result. Word-graphs (WGs) are also frequently provided as the ASR output as a means of obtaining alternative hypotheses, hopefully more accurate than the one provided by the Viterbi algorithm. The trouble is that WGs can grow up in a very computationally inefficient manner. The aim of this work is to fully describe a specific method, computationally affordable, for getting a WG given the input utterance. The paper focuses specifically on the underlying approaches and their influence on both the spatial cost and the performance.

Raquel Justo, Alicia Pérez, M. Inés Torres

Visual Word Aggregation

Most recent category-level object recognition systems work with visual words,

i.e.

vector quantized local descriptors. These visual vocabularies are usually constructed by using a single method such as

K

-means for clustering the descriptor vectors of patches sampled either densely or sparsely from a set of training images. Instead, in this paper we propose a novel methodology for building efficient codebooks for visual recognition using clustering aggregation techniques: the Visual Word Aggregation (VWA). Our aim is threefold: to increase the stability of the visual vocabulary construction process; to increase the image classification rate; and also to automatically determine the size of the visual codebook. Results on image classification are presented on the testbed PASCAL VOC Challenge 2007.

R. J. López-Sastre, J. Renes-Olalla, P. Gil-Jiménez, S. Maldonado-Bascón

Character-Level Interaction in Multimodal Computer-Assisted Transcription of Text Images

To date, automatic handwriting text recognition systems are far from being perfect and heavy human intervention is often required to check and correct the results of such systems. As an alternative, an interactive framework that integrates the human knowledge into the transcription process has been presented in previous works. In this work, multimodal interaction at character-level is studied. Until now, multimodal interaction had been studied only at whole-word level. However, character-level pen-stroke interactions may lead to more ergonomic and friendly interfaces. Empirical tests show that this approach can save significant amounts of user effort with respect to both fully manual transcription and non-interactive post-editing correction.

Daniel Martín-Albo, Verónica Romero, Alejandro H. Toselli, Enrique Vidal

Simultaneous Lesion Segmentation and Bias Correction in Breast Ultrasound Images

Ultrasound (US) B-mode images often show intensity inhomogeneities caused by an ultrasonic beam attenuation within the body. Due to this artifact, the conventional segmentation approaches based on intensity or intensity-statistics often do not obtain accurate results. In this paper, Markov Random Fields (MRF) and a maximum

a posteriori

(MAP) framework in combination with US image spatial information is used to estimate the distortion field in order to correct the image while segmenting regions of similar intensity inhomogeneity. The proposed approach has been evaluated using a set of 56 breast B-mode US images and compared to a radiologist segmentation.

Gerard Pons, Joan Martí, Robert Martí, J. Alison Noble

Music Score Binarization Based on Domain Knowledge

Image binarization is a common operation in the pre- processing stage in most Optical Music Recognition (OMR) systems. The choice of an appropriate binarization method for handwritten music scores is a difficult problem. Several works have already evaluated the performance of existing binarization processes in diverse applications. However, no goal-directed studies for music sheets documents were carried out. This paper presents a novel binarization method based in the content knowledge of the image. The method only needs the estimation of the staffline thickness and the vertical distance between two stafflines. This information is extracted directly from the gray level music score. The proposed binarization procedure is experimentally compared with several state of the art methods.

Telmo Pinto, Ana Rebelo, Gilson Giraldi, Jaime S. Cardoso

Identifying Potentially Cancerous Tissues in Chromoendoscopy Images

The dynamics of image acquisition conditions for gastroenterology imaging scenarios pose novel challenges for automatic computer assisted decision systems. Such systems should have the ability to mimic the tissue characterization of the physicians. In this paper, our objective is to compare some feature extraction methods to classify a Chromoendoscopy image into two different classes: Normal and Potentially cancerous. Results show that LoG filters generally give best classification accuracy among the other feature extraction methods considered.

Farhan Riaz, Fernando Vilarino, Mario Dinis Ribeiro, Miguel Coimbra

Myocardial Perfusion Analysis from Adenosine-Induced Stress MDCT

Myocardial perfusion assessment is of paramount importance for the diagnosis of coronary artery disease. This can be performed using different image modalities such as single-photon emission computed tomography (SPECT) or magnetic resonance imaging (MRI). Recently, cardiac multiple-detector computed tomography (MDCT) has shown promising results with the benefit of gathering data regarding coronary anatomy, ventricular function and myocardial perfusion in a single study. Preliminary results for three different methods for automatic assessment of myocardial perfusion from adenosine-induced stress MDCT are presented.

Samuel Silva, Nuno Bettencourt, Daniel Leite, João Rocha, Mónica Carvalho, Joaquim Madeira, Beatriz Sousa Santos

Handwritten Digits Recognition Improved by Multiresolution Classifier Fusion

One common approach to construction of highly accurate classifiers for hadwritten digit recognition is fusion of several weaker classifiers into a compound one, which (when meeting some constraints) outperforms all the individual fused classifiers. This paper studies the possibility of fusing classifiers of different kinds (Self-Organizing Maps, Randomized Trees, and AdaBoost with MB-LBP weak hypotheses) constructed on training sets resampled to different resolutions. While it is common to select one resolution of the input samples as the “ideal one” and fuse classifiers constructed for it, this paper shows that the accuracy of classification can be improved by fusing information from several scales.

Miroslav Štrba, Adam Herout, Jiří Havel

A Comparison of Spectrum Kernel Machines for Protein Subnuclear Localization

In this article, we compare the performance of a new kernel machine with respect to support vector machines (SVM) for prediction of the subnuclear localization of a protein from the primary sequence information. Both machines use the same type of kernel but differ in the criteria to build the classifier. To measure the similarity between protein sequences we employ a

k

-spectrum kernel to exploit the contextual information around an amino acid and the conserved motif information. We choose Nuc-PLoc benchmark datasets to evaluate both methods. In most subnuclear locations our classifier has better overall accuracy than SVM. Moreover, our method shows less computational cost than SVM.

Esteban Vegas, Ferran Reverter, Josep M. Oller, José M. Elías

Complex Wavelet Transform Variants in a Scale Invariant Classification of Celiac Disease

In this paper, we present variants of the Dual-Tree Complex Wavelet Transform (DT-CWT) in order to automatically classify endoscopic images with respect to the Marsh classification. The feature vectors either consist of the means and standard deviations of the subbands from a DT-CWT variant or of the Weibull parameter of these subbands. To reduce the effects of different distances and perspectives toward the mucosa, we enhanced the scale invariance by applying the discrete Fourier transform or the discrete cosine transform across the scale dimension of the feature vector.

Andreas Uhl, Andreas Vécsei, Georg Wimmer

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise